kltm / reusabledata-staging Goto Github PK
View Code? Open in Web Editor NEWAn experimental/staging repository for reusabledata/reusabledata.
License: BSD 3-Clause "New" or "Revised" License
An experimental/staging repository for reusabledata/reusabledata.
License: BSD 3-Clause "New" or "Revised" License
This top-level item is to coordinate the main curation effort. Add all sources to do in alphabetical order below. There is a curation request, in double alphabetical order, by names for @jmcmurry @kltm @lrwyatt @lwinfree @mellybelly @rchampieux
When a annotation on a resource is completed, please remember to switch status
to complete
.
Remember, if you find any criteria violations, you'll likely need to create a new field license-issues
; see the schema below.
incomplete
due to auto-grade mismatch; check curator intention: https://gitter.im/reusabledata-staging/curation?at=598e56802723db8d5e975ce0)The data resource repo is here: https://github.com/kltm/reusabledata-staging/tree/master/data-sources
I think we would prefer a PR workflow at this point. If you have questions about that, feel free to ask somebody who might know. The curation and editing should be able to be done directly through the GitHub interface, but feel free to use any method you're comfortable with.
All documents have been seeded with data from an earlier version of the Monarch data spreadsheet and may not be completely up to date, as well as any inaccuracies that I may have introduced during the port. As well, an older version of the criteria schema contained the fields license-downstream-positive
and license-downstream-negative
, to attempt to track license language that specifically noted downstream use. This is no longer a separate field, and items that used to be in them have all been moved to the license-commentary
field, sometimes with a leading +
or -
. If you feel that the comments are not longer pertinent, are too verbose, or do not make sense, please feel free to remove them--if we want them back they are in the document history.
To see all the available annotation slots and some comments about their values, please see the schema here: https://github.com/kltm/reusabledata-staging/blob/master/scripts/source.schema.yaml
Usable abbreviations for many known licenses can be found here: https://spdx.org/licenses/
in the schema we specify:
## The license that is used.
## Should try and use SPDX where we can: https://spdx.org/licenses/
## or: "unknown", "public domain", "all right reserved", or "custom".
"license":
type: str
required: yes
## The type of license that is being used.
## If you do not know, enter "TODO".
## E.g. "unknown", "copyleft", "permissive", "copyright", "restrictive", etc.?
"license-type":
type: str
required: yes
Lets define use of these enums, for example, is CC-BY-ND permissive or restrictive?
Getting some ideas down here:
ReusableData.org was developed as part of the Monarch Initiative and the NCATS Biomedical Data Translator, where the reuse and free redistribution of publicly available data for disease discovery was burdensome. ReusableData.org was created to help others navigate the legal redistribution of public data and to help data providers make it easier for others to reuse their data.
ReusableData.org is funded by the National Center for Advancing Translational Sciences NCATS
OT3 TR002019 as part of the Biomedical Data Translator project.
We are grateful to the many original sources of our data for allowing their integration.
Add something about licensing/attribution for images.
It would be nice to have these, especially where NIH funded
Just parking this here.
This resource: https://neo4j.het.io/browser/
also had the same problem as us: http://www.nature.com/news/legal-confusion-threatens-to-slow-data-science-1.20359?WT.mc_id=TWT_NatureNews
They have a data license table here: https://github.com/dhimmel/integrate/blob/d482033bcaa913a976faf4a6ee08497281c739c3/licenses/README.md
I like how they annotate Nodes/edges:
Source indicates the date when and location where the license information was retrieved. Blank values indicate no licensing information was found. Institution indicates where the resource was created. Funder indicates who funded the project and links to the source of the funding information.
In the DataTable, while the Bootstrap 4a popovers function as advertised on the first page of results, they stop functioning on "paged" results.
In all likelihood, bootstrap does not get to run its init code before DataTable takes them away from the display. Either need to let Bootstrap go first, or get Bootstrap to re-init on page.
We should have a list of relevant reading materials. I will post these here, @kltm let me know what format we want and I can also make a file for them elsewhere.
undetermined should be used only for when we cannot find license info, yes? and if so do we still link to where it should be but isn't?
Curate single most important thing that a resource could do to improve - keep it constructive.
I'm having trouble figuring out the Coriell Institute resource, for which I've done a disappointing first pass: https://github.com/kltm/reusabledata-staging/blob/master/data-sources/coriell-institute.yaml
I guess it's really two questions. The first is: is Coriell Institute actually a real and public upstream resource for data, or is it something else that just happens to be in Monarch for some quirky reason?
As this was in the initial spreadsheet and in dipper (https://github.com/monarch-initiative/dipper/blob/4cec8174bc713702cd0eecd06ce83847d23da164/dipper/sources/Coriell.py#L64), it seems like there should be data there, but I have been unable to find any working my way in from the top-level public website so far (see question 2 as well). As well, the dipper class has private keys and passwords to possibly internal SFTP servers. Is this actually some kind of private resource that Monarch has negotiated access to? If so, should we be evaluating it?
The second question is, assuming that this is a legit public resource that we want to grade: I can find no reason to not give it 0 stars as it makes no real mention of data or license besides some spreadsheets I stumbled across; does this feel right?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.