GithubHelp home page GithubHelp logo

paulnagy / dicom2omop Goto Github PK

View Code? Open in Web Editor NEW
6.0 3.0 7.0 79.29 MB

This project looks at creating a controlled vocabulary for DICOM Pt 6 Data Dictionary with a focus on CS code strings.

License: Apache License 2.0

Jupyter Notebook 99.97% Python 0.03%

dicom2omop's Introduction

Dicom2OMOP

This project builds on a published work, "Development of Medical Imaging Data Standardization for Imaging-Based Observational Research: OMOP Common Data Model Extension". A copy of the full paper can also be found in the files folder. This project looks at creating a controlled vocabulary for the DICOM Pt 6 Data Dictionary, focusing on CS code strings in the OMOP vocabulary format and harmonizing common data elements (CDE).

  • Create a library that takes a dicom tag
    • (eg Part 6 0010,0040 Patient Sex)
    • (Link to Part 16 CID 7455 Sex)
    • (Ingest FHIR JSON)
    • (Create "maps to" from Source to Standard vocabulary for OMOP gender)
  • Identify current gaps in SNOMED and LOINC mapping from DICOM

References

Requirements

Developed and Tested for:

  • WSL Ubuntu 22.04.3 LTS (Jammy Jellyfish)
  • openjdk version "11.0.21" 2023-10-17
  • Python 3.10.12

Instructions

0. Ensure the system is up to date and install dependencies

sudo apt update && sudo apt upgrade
sudo apt install bzip2 default-jdk default-jre xsltproc libxml2-utils python3-pip python3.10-venv

For macOS, use Homebrew package management system

brew update
brew upgrade # if needed

Verify java installation

java -version

openjdk version "11.0.21" 2023-10-17

The macOS has its legacy Java, and it is possible that your macOS is not using openjdk. You can install it and read the output for next steps (e.g., setting symlink and path)

brew install openjdk
brew info openjdk

Symlink

sudo ln -sfn /usr/local/opt/openjdk/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk.jdk

Set path

echo 'export PATH="/usr/local/opt/openjdk/bin:$PATH"' >> ~/.zshrc

1. Download the current source and rendering pipeline from dicom.nema.org using curl (wget or another method will also work)

curl https://dicom.nema.org/medical/dicom/current/DocBookDICOM2024a_sourceandrenderingpipeline_20240120075929.tar.bz2 --output sourceandrenderingpipeline.tar.bz2

2. Extract to a directory, remove archive, and navigate to the directory

mkdir sourceandrenderingpipeline 

tar -xvf sourceandrenderingpipeline.tar.bz2 -C sourceandrenderingpipeline

rm sourceandrenderingpipeline.tar.bz2

cd sourceandrenderingpipeline

3. Update absolute paths using the provided bash script

./updateabsolutepaths.sh

If you get an error message that gsed: command not found, then install gsed package first and run it again.

brew install gnu-sed

4. Generate the databases for the parts (example for part 16)

./generateolinkdb.sh 16

5. Generate FHIR valuesets

5.1 Navigate to the valuesets subdirectory and download Java package dependencies

cd valuesets
curl https://repo1.maven.org/maven2/javax/json/javax.json-api/1.0/javax.json-api-1.0.jar --output javax.json-api-1.0.jar
curl https://repo1.maven.org/maven2/org/glassfish/javax.json/1.0.4/javax.json-1.0.4.jar --output javax.json-1.0.4.jar

5.2 Create a backup of the bash script used to extract valuesets, then modify the reference to the Java packages in the bash file and run it

cp extractvaluesets.sh{,.old}
sed -i 's|${HOME}/work/pixelmed/imgbook/lib/additional/|./|g' extractvaluesets.sh
./extractvaluesets.sh

For macOS, -i syntax needs explicit argument specifying the extension for backup files.

cp extractvaluesets.sh{,.old}
sed -i.bak 's|${HOME}/work/pixelmed/imgbook/lib/additional/|./|g' extractvaluesets.sh
./extractvaluesets.sh

5.3 Count the extracted json files to validate they match

find ./valuesets/fhir/json/ -type f -name "*.json" | wc -l

1341

6. Setup the Python virtual environment

note: this should be done in the root DICOM2OMOP directory. We use cd ../.. here to navigate there from DICOM2OMOP/sourceandrenderingpipeline/valuesets

This may take a few minutes as packages are installed.

cd ../..
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python DICOM_P16_harvest_json.py

References

dicom2omop's People

Contributors

jenpark10 avatar paulnagy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dicom2omop's Issues

Duplicates in DICOM vocabulary

I followed the steps from ReadMe to create fhir_valuesets.csv file from part16 file. When I ran the dicom_to_omop_tables.ipynb (I did some modifications for importing the files into bigquery dataset), it populated vocabulary concept (and also vocabulary and concept_class) table. The import created some duplicates. e.g. - Tissue Velocity, Ophthalmic Axial Measurements etc

Looks like Ophthalmic Axial Measurements is related to cid-30-DICOMDevice, cid-33-Modality and cid-29-AquisitionModality (refer to part16_fhir_valuesets.csv).

Should they be de-duplicated or do they need to bring this additional context (Modality, AquisitionModality and DICOMDevice) in to the concepts?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.