GithubHelp home page GithubHelp logo

longciu's Introduction

Long COVID Iowa-UNICAMP dataset code

Download link: https://doi.org/10.25820/data.007301

Data citation:

@misc{CarmoDiedreS2024LCI,
title = {Long COVID Iowa-UNICAMP [dataset]},
author = {Carmo, Diedre S and Pezzulo, Alejandro A and Villacreses, Raul A and Eisenbeisz, McKenna L and Anderson, Rachel L and Van Dorin, Sarah E and Rittner, Leticia and Lotufo, 
Roberto A and Gerard, Sarah E and Reinhardt, Joseph M and Comellas, Alejandro P},
doi = {10.25820/data.007301},
keywords = {Post COVID-19;COVID-19;Segmentation},
language = {eng},
publisher = {University of Iowa},
subject = {Computed Tomography;Pulmonary Diseases},
year = {2024},
howpublished = {\url{https://doi.org/10.25820/data.007301}}}

Thanks for downloading the Long COVID Iowa-UNICAMP dataset. If you use this data, please cite our dataset paper.

A paper accompanying this dataset is under submission, and this description will be updated to include its reference in the future.

Abstract

The field of supervised automated medical imaging segmentation suffers from the lack of annotated groundtruth data. This problem is even more noticeable when dealing with the segmentation of multiple types of lung findings in computed tomography with uncertain borders, such as opacities and parenchymal consolidation resulting from pneumonia. In this work, we make available the first public dataset of ground glass opacity and consolidation in the lung of Long COVID patients. The Long COVID Iowa-UNICAMP dataset (LongCIU) was built by three independent expert annotators, blindly segmenting the same 90 selected axial slices manually, without using any automated initialization. We make available not only the final consensus segmentation, but also the individual segmentation from each annotator totaling 360 slices. This dataset can be used to train and validate new automated segmentation methods and to study interrater uncertainty in lung opacities' segmentation on computed tomography.

overview
Overview of the dataset creation process.
sample
Sample CT scan slice, accompanying annotations, and final STAPLE consensus.

Directory tree

The "longciu.zip" file provided in the download link should contain the following files:

.
├── longciu_1_tgt.nii.gz
├── longciu_2_tgt.nii.gz
├── longciu_3_tgt.nii.gz
├── longciu_img.nii.gz
├── longciu_medpseg_output.nii.gz
├── longciu_splits.json
├── longciu_STAPLE_tgt.nii.gz
├── README.md
└── staple_pub_stats.json

File tree for the dataset, including images, annotations per annotator and consensus, MEDPSeg output, optional proposed training splits, and STAPLE statistics. .nii.gz formats can be read using libraries such as SimpleITK or NiBabel.

longciu_X_tgt.nii.gz

Results from blind annotations from annotator 1, 2, and 3.

longciu_img.nii.gz

Anonymized CT scan slices used for annotation, in original HU intensities.

CT scan data was originally collected at the University of Iowa Hospitals and Clinics, in an ongoing Long COVID study. Adults with history of COVID-19 infection confirmed by a positive antigen or reverse transcriptase-polymerase chain reaction that remained symptomatic 30 days or more following diagnosis were prospectively enrolled starting in June 2020. Patient data collected included age, sex, CT scan, days between acute COVID-19 diagnosis and CT scan, vaccination status, comorbidities, and other information. Chest CT scans were conducted with Siemens scanners. All scans employed tube current modulation. A standardized protocol for noncontrast chest CT imaging was followed, which entailed acquiring an inspiratory scan at total lung capacity (TLC) and an expiratory scan at residual volume (RV). Reconstruction of images was performed using iterative reconstruction techniques with a section thickness x interval of 1 x 0.5 mm. This study was performed in line with the principles of the Declaration of Helsinki. Study protocols were approved by the institutional review board and were Health Insurance Portability and Protection Act–compliant. Participants were required to sign written informed consent before inclusion, including the possibility of sharing anonymized images.

longciu_medpseg_output.nii.gz

Output from the MEDPSeg[1] model, which was not trained in this dataset.

longciu_splits.json

Optional, proposed data splits.

longciu_STAPLE_tgt.nii.gz

Final consensus segmentation, using STAPLE[2].

README.md

A short description of the dataset.

staple_pub_stats.json

Output statistics from the SimpleITK STAPLE implementation, for reproducibility.

Code

The software repository for this project is: https://github.com/MICLab-Unicamp/LongCIU. Source code reproduces the STAPLE consensus, sample training, evaluation examples, and technical validation. The GitHub repository presents the minimum code necessary to reproduce processing and technical validation involved in making our annotations of GGO and Consolidations in Long COVID patients available, including interrater Dice and Cohen's-Kappa figures included in the paper (in the interrater.ipynb and dice.ipynb notebooks). Use the requirements.txt file to create an python environment suitable to run the code if necessary.

References

[1] Carmo, Diedre S., et al. "MEDPSeg: Hierarchical polymorphic multitask learning for the segmentation of ground-glass opacities, consolidation, and pulmonary structures on computed tomography" arXiv preprint arXiv:2312.02365 (2024).

[2] Warfield, Simon K., Kelly H. Zou, and William M. Wells. "Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation." IEEE transactions on medical imaging 23.7 (2004): 903-921.

longciu's People

Contributors

dscarmo avatar

Watchers

Roberto Lotufo avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.