GithubHelp home page GithubHelp logo

kltm / reusabledata-staging Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 2.14 MB

An experimental/staging repository for reusabledata/reusabledata.

License: BSD 3-Clause "New" or "Revised" License

JavaScript 63.46% HTML 24.99% Makefile 11.56%

reusabledata-staging's People

Contributors

jmcmurry avatar kltm avatar lrwyatt avatar lwinfree avatar nicolevasilevsky avatar rchampieux avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

reusabledata-staging's Issues

EPIC for main resource data curation

This top-level item is to coordinate the main curation effort. Add all sources to do in alphabetical order below. There is a curation request, in double alphabetical order, by names for @jmcmurry @kltm @lrwyatt @lwinfree @mellybelly @rchampieux

When a annotation on a resource is completed, please remember to switch status to complete.

Remember, if you find any criteria violations, you'll likely need to create a new field license-issues; see the schema below.

The data resource repo is here: https://github.com/kltm/reusabledata-staging/tree/master/data-sources
I think we would prefer a PR workflow at this point. If you have questions about that, feel free to ask somebody who might know. The curation and editing should be able to be done directly through the GitHub interface, but feel free to use any method you're comfortable with.

All documents have been seeded with data from an earlier version of the Monarch data spreadsheet and may not be completely up to date, as well as any inaccuracies that I may have introduced during the port. As well, an older version of the criteria schema contained the fields license-downstream-positive and license-downstream-negative, to attempt to track license language that specifically noted downstream use. This is no longer a separate field, and items that used to be in them have all been moved to the license-commentary field, sometimes with a leading + or -. If you feel that the comments are not longer pertinent, are too verbose, or do not make sense, please feel free to remove them--if we want them back they are in the document history.

To see all the available annotation slots and some comments about their values, please see the schema here: https://github.com/kltm/reusabledata-staging/blob/master/scripts/source.schema.yaml

Usable abbreviations for many known licenses can be found here: https://spdx.org/licenses/

Need definitions for license enums

in the schema we specify:


## The license that is used.
  ## Should try and use SPDX where we can: https://spdx.org/licenses/
  ## or: "unknown", "public domain", "all right reserved", or "custom".
  "license":
    type: str
    required: yes
  ## The type of license that is being used.
  ## If you do not know, enter "TODO".
  ## E.g. "unknown", "copyleft", "permissive", "copyright", "restrictive", etc.?
  "license-type":
    type: str
    required: yes

Lets define use of these enums, for example, is CC-BY-ND permissive or restrictive?

Add acknowledgement footer

Getting some ideas down here:

ReusableData.org was developed as part of the Monarch Initiative and the NCATS Biomedical Data Translator, where the reuse and free redistribution of publicly available data for disease discovery was burdensome. ReusableData.org was created to help others navigate the legal redistribution of public data and to help data providers make it easier for others to reuse their data.

ReusableData.org is funded by the National Center for Advancing Translational Sciences NCATS
OT3 TR002019 as part of the Biomedical Data Translator project.

We are grateful to the many original sources of our data for allowing their integration.

Add something about licensing/attribution for images.

FlyBase Curation Notes

Clearly stated license for data use:

  • This is not clearly present and what is presented is not expressed via a standard license. We might want to consider using this as a criteria for "clearly".
  • Additionally, another thought - if a license is clearly stated, but is restrictive, would it get a "point"? Technically, via the current working of this dimension, I think it would. Should we be more specific?

Allows use and reuse:

  • Use and reuse are allowed, but there are exceptions and limitations.
  • Question - I think we need to clarify for ourselves and others the relationship between the criteria dimensions. For example, it is one star per dimension? If so, star counts could actually mean very different things based on what stars are actually awarded.

Non-discriminatory

  • Do we need to discriminate between instances when discriminatory reuse is unavoidable versus arbitrary?

Non-revocable

  • No issues with this dimension - however, we might want to clarify how we are regarding this in our criteria versus legal questions of revocable and irrevocable licensing.

Freely and openly available

  • I think it is confusing how we are applying this dimension of the rubric. Are free and open the core requirements here? This comment is more about the language we're using in the description of the dimension.

EPIC for page content

  • make markdown-ification of all "core" data discussions:
    • license-type discussion
    • criteria flowchart
    • bibliography/reading list
  • make md to html insertion tool for our static page gen
  • links into flow chart
  • automatic scoring/grading
  • go over main doc
    • text
    • pix
  • fill out more pages
    • about (see #60)
      • citation policy (also see footer)
      • contributors
    • criteria exp
    • reading list (see #59 )
    • footer (@mellybelly will help here, also see #60 )
  • social/forums
    • Link to Google group (do we actually have this? want this?)
    • slack? (now have reusabledata.slack.com, as well as gitter for reusabledata-staging; which to use when live?)
    • twitter tags that one of us would monitor? #reusabledata?
  • ensure we have all translator data sources. (not needed for first version? maybe a todo for the live site) @mellybelly where is the "full" list?
  • google analytics
  • switch to public repo
  • switch to auto-deploy on merge (using bbop jenkins or travis ?)

some related work

Just parking this here.
This resource: https://neo4j.het.io/browser/
also had the same problem as us: http://www.nature.com/news/legal-confusion-threatens-to-slow-data-science-1.20359?WT.mc_id=TWT_NatureNews
They have a data license table here: https://github.com/dhimmel/integrate/blob/d482033bcaa913a976faf4a6ee08497281c739c3/licenses/README.md

I like how they annotate Nodes/edges:
Source indicates the date when and location where the license information was retrieved. Blank values indicate no licensing information was found. Institution indicates where the resource was created. Funder indicates who funded the project and links to the source of the funding information.

Bootstrap popovers do not function on paged items

In the DataTable, while the Bootstrap 4a popovers function as advertised on the first page of results, they stop functioning on "paged" results.

In all likelihood, bootstrap does not get to run its init code before DataTable takes them away from the display. Either need to let Bootstrap go first, or get Bootstrap to re-init on page.

Add proper reading list for site

We should have a list of relevant reading materials. I will post these here, @kltm let me know what format we want and I can also make a file for them elsewhere.

Request for help with Coriell Institute

I'm having trouble figuring out the Coriell Institute resource, for which I've done a disappointing first pass: https://github.com/kltm/reusabledata-staging/blob/master/data-sources/coriell-institute.yaml

I guess it's really two questions. The first is: is Coriell Institute actually a real and public upstream resource for data, or is it something else that just happens to be in Monarch for some quirky reason?

As this was in the initial spreadsheet and in dipper (https://github.com/monarch-initiative/dipper/blob/4cec8174bc713702cd0eecd06ce83847d23da164/dipper/sources/Coriell.py#L64), it seems like there should be data there, but I have been unable to find any working my way in from the top-level public website so far (see question 2 as well). As well, the dipper class has private keys and passwords to possibly internal SFTP servers. Is this actually some kind of private resource that Monarch has negotiated access to? If so, should we be evaluating it?

The second question is, assuming that this is a legit public resource that we want to grade: I can find no reason to not give it 0 stars as it makes no real mention of data or license besides some spreadsheets I stumbled across; does this feel right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.