microsoft / genomicsnotebook Goto Github PK

View Code? Open in Web Editor NEW

95.0 19.0 46.0 17.36 MB

Jupyter Notebooks on Azure for Genomics Data Analysis

License: MIT License

Jupyter Notebook 98.93% TSQL 1.07%

genomicsnotebook's People

Contributors

Stargazers

Watchers

genomicsnotebook's Issues

3-pharmacogenomics-confidential.ipynb Need to index BAM file

Before running samtools view, need to add this line !samtools index HG002_GRCh38.haplotag.10x.bam

Description on simulated clinical and phenotypic datasets referred in `/sample-notebooks /genomicsML.ipynb`

Thank you very much for sharing an informative set of Jupyter Notebooks.

I've been reviewing the Train Machine Learning Models with Genomics + Clinical Data notebook that uses simulated clinical and phenotypic datasets. However, I couldn't find details on how this datasets are generated.

Could you provide insight into how this data is generated, or direct me to any resources or documentation on this matter?

Thank you very much in advance

No link to Download PacBio VCF Files

In Notebook FHIR 1_export_data,

5.3) Convert all PacBio VCFs to TSV

This step assumes you already have VCF files in a storage account container. You could download the vcfs directly into the VM and then copy to the container or leave in the VM. Either way, should not assume the user already has the data.

Json file list path to add to fhir store invalid path

This line : for filename in glob(f"/home/azureuser/cloudfiles/data/datastore/synthea/fhir/*.json"): did not work for me, I had to use the mnt path from the top of the notebook.

Creating FHIR Server

In this notebook 1-data-export, step 2.1 below, should somehow be after step 2.4, because the fhir server has not yet been created. First the user needs to go to Azure API for FHIR, create the FHIR server, then they can do the rest. Creating the server is not described in the instructions.

2.1) Create an "Azure API for FHIR"^[3] instance, named <fhir_server>

Navigate to https://<fhir_server>.azurehealthcareapis.com/metadata and verify a "Capability Statement" is retrieved.
That means the FHIR server^[3] is running.
Set fhir_server in Section 3.1
Use RBAC^[6]: <fhir_server> left pane "Identity" -> "On" -> "Save"

Username misleading in mnt path

In the first FHIR notebook, the mnt path is kind of confusing. Where it says USERNAME here is not my AD username, but rather than name of the compute env? I think. Either way, this path took some digging to figure out and could be clarified in the notebook.

import subprocess

subprocess.run(["./run_synthea",
"-s", "42",
"-cs", "99",
"-p", "10",
f'--exporter.baseDirectory=/mnt/batch/tasks/shared/LS_root/mounts/clusters//code'
]);

Feedback on FHIR > 1_export_data.ipynb

the mnt path is kind of confusing. Where it says USERNAME here is not my AD username, but rather than name of the compute env? I think. Either way, this path took some digging to figure out and could be clarified in the notebook.

import subprocess

subprocess.run(["./run_synthea",
"-s", "42",
"-cs", "99",
"-p", "10",
f'--exporter.baseDirectory=/mnt/batch/tasks/shared/LS_root/mounts/clusters/USERNAME/code'
]);

Step 2.1 below, should somehow be after step 2.4, because the fhir server has not yet been created. First the user needs to go to Azure API for FHIR, create the FHIR server, then they can do the rest. Creating the server is not described in the instructions.

2.1) Create an "Azure API for FHIR"[3] instance, named <fhir_server>

Navigate to https://<fhir_server>.azurehealthcareapis.com/metadata and verify a "Capability Statement" is retrieved.
That means the FHIR server[3] is running.
Set fhir_server in Section 3.1
Use RBAC[6]: <fhir_server> left pane "Identity" -> "On" -> "Save"

This line : for filename in glob(f"/home/azureuser/cloudfiles/data/datastore/synthea/fhir/*.json"): did not work for me, I had to use the mnt path from the top of the notebook.
4. Set up the FHIR->Synapse Sync Agent

This notebook section follows the "FHIR to Synapse Sync Agent" tutorial provided Microsoft's "FHIR Analytics Pipelines" Github repository^[13].

4.1) Deploy the custom Azure template provided by the "FHIR to Synapse Sync Agent" tutorial^[13].

Navigate to the Github repo by clicking this link.

The GitHub link is no longer valid. I went to that repo, but its not clear which doc I use to deploy.

5.3) Convert all PacBio VCFs to TSV

microsoft / genomicsnotebook Goto Github PK

genomicsnotebook's People

Contributors

Stargazers

Watchers

Forkers

genomicsnotebook's Issues

3-pharmacogenomics-confidential.ipynb Need to index BAM file

Description on simulated clinical and phenotypic datasets referred in `/sample-notebooks /genomicsML.ipynb`

No link to Download PacBio VCF Files

Json file list path to add to fhir store invalid path

Creating FHIR Server

Username misleading in mnt path

Feedback on FHIR > 1_export_data.ipynb

4. Set up the FHIR->Synapse Sync Agent

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs