microsoft / genomicsnotebook Goto Github PK
View Code? Open in Web Editor NEWJupyter Notebooks on Azure for Genomics Data Analysis
License: MIT License
Jupyter Notebooks on Azure for Genomics Data Analysis
License: MIT License
import subprocess
subprocess.run(["./run_synthea",
"-s", "42",
"-cs", "99",
"-p", "10",
f'--exporter.baseDirectory=/mnt/batch/tasks/shared/LS_root/mounts/clusters/USERNAME/code'
]);
2.1) Create an "Azure API for FHIR"[3] instance, named <fhir_server>
Navigate to https://<fhir_server>.azurehealthcareapis.com/metadata and verify a "Capability Statement" is retrieved.
That means the FHIR server[3] is running.
Set fhir_server in Section 3.1
Use RBAC[6]: <fhir_server> left pane "Identity" -> "On" -> "Save"
This line : for filename in glob(f"/home/azureuser/cloudfiles/data/datastore/synthea/fhir/*.json"): did not work for me, I had to use the mnt path from the top of the notebook.
This notebook section follows the "FHIR to Synapse Sync Agent" tutorial provided Microsoft's "FHIR Analytics Pipelines" Github repository[13].
4.1) Deploy the custom Azure template provided by the "FHIR to Synapse Sync Agent" tutorial[13].
The GitHub link is no longer valid. I went to that repo, but its not clear which doc I use to deploy.
This step assumes you already have VCF files in a storage account container. You could download the vcfs directly into the VM and then copy to the container or leave in the VM. Either way, should not assume the user already has the data.
In this notebook 1-data-export, step 2.1 below, should somehow be after step 2.4, because the fhir server has not yet been created. First the user needs to go to Azure API for FHIR, create the FHIR server, then they can do the rest. Creating the server is not described in the instructions.
2.1) Create an "Azure API for FHIR"[3] instance, named <fhir_server>
https://<fhir_server>.azurehealthcareapis.com/metadata
and verify a "Capability Statement" is retrieved.fhir_server
in Section 3.1<fhir_server>
left pane "Identity" -> "On" -> "Save"Thank you very much for sharing an informative set of Jupyter Notebooks.
I've been reviewing the Train Machine Learning Models with Genomics + Clinical Data notebook that uses simulated clinical and phenotypic datasets. However, I couldn't find details on how this datasets are generated.
Could you provide insight into how this data is generated, or direct me to any resources or documentation on this matter?
Thank you very much in advance
In the first FHIR notebook, the mnt path is kind of confusing. Where it says USERNAME here is not my AD username, but rather than name of the compute env? I think. Either way, this path took some digging to figure out and could be clarified in the notebook.
import subprocess
subprocess.run(["./run_synthea",
"-s", "42",
"-cs", "99",
"-p", "10",
f'--exporter.baseDirectory=/mnt/batch/tasks/shared/LS_root/mounts/clusters//code'
]);
This line : for filename in glob(f"/home/azureuser/cloudfiles/data/datastore/synthea/fhir/*.json"): did not work for me, I had to use the mnt path from the top of the notebook.
Before running samtools view, need to add this line !samtools index HG002_GRCh38.haplotag.10x.bam
In Notebook FHIR 1_export_data,
5.3) Convert all PacBio VCFs to TSV
This step assumes you already have VCF files in a storage account container. You could download the vcfs directly into the VM and then copy to the container or leave in the VM. Either way, should not assume the user already has the data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.