GithubHelp home page GithubHelp logo

global-localhost / genomicsnotebook Goto Github PK

View Code? Open in Web Editor NEW

This project forked from microsoft/genomicsnotebook

0.0 0.0 0.0 5.11 MB

Jupyter Notebooks on Azure for Genomics Data Analysis

License: MIT License

Jupyter Notebook 100.00%

genomicsnotebook's Introduction

Genomics Data Analysis with Jupyter Notebooks on Azure

text

Jupyter notebooks are a great tool for data scientists who are working on genomics data analysis. In this repo, we demonstrate the use of Azure Notebooks for genomics data analysis via GATK, Picard, Bioconductor and Python libraries.

Here is the list of sample notebooks on this repo:

  1. genomics.ipynb: Analysis from 'uBAM' to 'structured data table' analysis.
  2. genomicsML.ipynb: Train Machine Learning models with Genomics + Clinical Data
  3. genomics-platinum-genomes.ipynb: Accessing Illumina Platinum Genomes data from Azure Open Datasets* and to make initial data analysis.
  4. genomics-reference-genomes.ipynb: Accessing reference genomes from Azure Open Datasets*
  5. genomics-clinvar.ipynb: Accessing ClinVar data from Azure Open Datasets*
  6. genomics-giab.ipynb: Accessing Genome in a Bottle data from Azure Open Datasets*
  7. SnpEff.ipynb: Accessing SnpEff databases from Azure Open Datasets*
  8. Bioconductor.ipynb: Pulling Bioconductor Docker image from Microsoft Container Registry
  9. simtotable.ipynb: Simulate NGS data, use Cromwell on Azure OR Microsoft Genomics service for secondary analysis and convert the gVCF data to a structured data table.

*Technical note: Explore Azure Genomics Data Lake with Azure Storage Explorer

1. Prerequisites

Create and manage Azure Machine Learning workspaces in the Azure portal

text

For further details on creation of Azure ML workspace please visit this page.

Run the notebook in your workspace

This chapter uses the cloud notebook server in your workspace for an install-free and pre-configured experience. Use your own environment if you prefer to have control over your environment, packages and dependencies.

Follow along with this video or use the detailed steps below to clone and run the tutorial from your workspace.

Watch the video

2. Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

3. References

  1. Jupyter Notebook on Azure
  2. Introduction to Azure Notebooks
  3. GATK
  4. Picard
  5. Azure Machine Learning
  6. Azure Open Datasets
  7. Cromwell on Azure
  8. Bioconductor

genomicsnotebook's People

Contributors

erdalcosgun avatar microsoftopensource avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.