GithubHelp home page GithubHelp logo

tapaswenipathak / datahub Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cbioportal/datahub

0.0 1.0 0.0 822.61 MB

A centralized location for storing curated data ready for inclusion in cBioPortal.

Shell 1.37% Python 5.96% HTML 92.67%

datahub's Introduction

cBioPortal Public Datahub

The datahub is a repository for store data only. It contains staging files which are pre-validated and can be loaded directly into the cBioPortal.

Behind the scenes git-lfs is used to manage the large files. https://github.com/github/git-lfs

Test Status

Validation status of all studies on Datahub master branch. This runs weekly using the validation code from the cBioPortal master branch. It also validates if the studies on cbioportal.org and on Datahub are in sync.

CircleCI

How to Download Data

Downloading zip files individual studies

At cbioportal.org a zipped folder with staging files from each study can be downloaded. These zip files are compressed versions of the study folders in the master branch of this repository.

Example downloading individual study with git-lfs

It is also possible to download uncompressed staging files from this repository with git-lfs.

After you have installed git-lfs, configure it not to download all data files right away:

git lfs install --skip-repo --skip-smudge

Clone the git repository and install lfs hooks into it:

git clone https://github.com/cBioPortal/datahub.git
cd datahub
git lfs install --local --skip-smudge

Download the data files for a study folder, for example brca_tcga:

git lfs pull -I public/brca_tcga

How to Upload Data

Create a new branch from the 'master' branch.

git checkout master
git pull origin master
git checkout -b [name_of_your_new_branch]

For general background on creating and managing branches within GitHub, see: Git Branching and Merging.

Commit changes, and push the branch back to GitHub.

[back to the root directory]
git add .
git commit -m '[notes_for_your_change]'
git push origin [name_of_your_new_branch]

Open a Pull Request on GitHub to the 'master' branch.

For instructions on submitting a pull-request, please see: Using Pull Requests and Sending Pull Requests.

Download a complete MySQL export of the latest database

http://download.cbioportal.org/mysql-snapshots/mysql-snapshots-toc.html

License

The data are available under the ODC Open Database License (ODbL) (summary available here): you are free to share and modify the data so long as you attribute any public use of the database, or works produced from the database; keep the resulting data-sets open; and offer your shared or adapted version of the data-set under the same ODbL license.

TCGA data are availabe under Broad Institute GDAC TCGA Analysis Pipeline License. The Cancer Genome Atlas Consortium is pleased to provide the researchcommunity with preliminary data prior to publication. Users are requested to carefully consider that these data are preliminary and have yet to be validated. Researchers are warned that the preliminary data have a significant uncertainty, are likely to change, and should be used with caution.

datahub's People

Contributors

alexsigaras avatar alisman avatar ao508 avatar averyniceday avatar dionnezaal avatar fedde-s avatar inodb avatar jim-bo avatar jjgao avatar kalletlak avatar n1zea144 avatar oplantalech avatar pieterlukasse avatar ritikakundra avatar rmadupuri avatar yichaos avatar zheins avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.