GithubHelp home page GithubHelp logo

ml-workgroup / covid-19-image-repository Goto Github PK

View Code? Open in Web Editor NEW
43.0 5.0 15.0 51.17 MB

Anonymized dataset of COVID-19 cases with a focus on radiological imaging. This includes images (x-ray / ct) with extensive metadata, such as admission-, ICU-, laboratory-, and patient master-data.

License: Other

Python 100.00%
covid-19 x-ray radiology xray laboratory-data dataset laboratory covid19 covid19-data covid-data

covid-19-image-repository's Introduction

COVID-19 Image Repository

This project aims to create an anonymized data set of COVID-19 cases with a focus on radiological imaging. This includes images with extensive metadata, such as admission-, ICU-, laboratory-, and patient master-data.

This repository contains image data from the Institute for Diagnostic and Interventional Radiology, Hannover Medical School, Hannover, Germany.

Feature Set

id label unit comment / reference interval (precision)
1 patient_id randomly generated patient id
2 image_id randomly generated image id (filename)
3 sex m/w
4 age years currently redacted (see below)
5 size cm currently redacted (see below)
6 weight kg currently redacted (see below)
7 admission offset days days since admission (begin of symptoms)
8 icu admission offset days
9 death offset days days until death or null
10 modality currently all images are chest radiographs
11 projection ap, pa ...
12 lactate dehydrogenase U/l < 248 (5)
13 c-reactive protein mg/l <= 5 (1)
14 d-dimer mg/l 0 - 0.5 (0.1)
15 coagulation factor XIII % 70 - 140 (5)
16 neutrophils Tsd/µl 1.5 - 7.7 (0.1)
17 lymphocytes Tsd/µl 1.1 - 4.0 (0.1)
18 pO2 mmHg (5)
19 pCO2 mmHg (5)
20 corona test type

Age, size, and weight are currently redacted. We will publish this data when there are enough patients, that meaningful intervals can be chosen according to the concept of k-anonymity and l-diversity.

Offsets, i.e. admission offset and icu admission offset, are given as relative times in regard to the exam. Please consult the feature set table for units. I.e. an admission offset == -4 and icu admission offset == 6 would encode, that the patient was admitted to the hospital four days ago and was transferred to the ICU six days after the image was taken. Especially the admission offset can be noisy; please see FAQ #1.

All lab values (12 - 19) are given in intervals to protect patient identity. A value below the detection limit is denoted by -inf, above the detection limit by inf.

Download

We provide the raw, unprocessed, gray value image data as Nifti files. This is done to protect patient identity, as Dicom files are hard to anonymize. However, the files are too large to host on Github.

Additionally, we included downscaled versions of the Nifti images in the png folder.

Space and bandwith are kindly provided by the Open Telekom Cloud (Deutschen Telekom AG).

  • Version 2.0

  • Version 1.0

    Do not use this version (see erratum 1). The data is provided for reproducibility reasons.

Observations (FAQ)

  1. admission offset > icu admission offset
    Some of the cases suggest that the admission to the clinic was after admission to the ICU. This is an artifact due to using the specific (COVID) case data for determining the admission date (and therefore offset). The patient may have already been at the ICU before being diagnosed with COVID.

Errata

  1. In Version 1.0 there is an issue with the date calculation (issue #6). This might lead to incorrect date offset calculations. We strongly advise to upgrade to a newer version.

License and Attribution

Master-, image- and laboratory-data of this repository are licensed under the Creative Commons Attribution 3.0 Unported (CC BY 3.0).

If you use this data, you must attribute the authors in any publication (DOI: 10.6084/m9.figshare.12275009). You may include the specific release or commit hash for reproducibility.

Contact Information

Please use the ticketing system where applicable. Otherwise, please use the following EMail address: [email protected]

covid-19-image-repository's People

Contributors

halaser avatar liob avatar tbnv999 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

covid-19-image-repository's Issues

Assume all images are COVID positive?

If the patient is not admitted yet (positive admission offset) should we assume they are COVID positive? Possibly just not testing positive at the moment?

Some images cropped when adding to covid-chestray-dataset

Hi! While adding these images to @ieee8023's covid-chestxray-dataset, I cropped a few to cut out part of the black margin. I just wanted to share the filenames in case you were interested in doing the same thing.

00d96e05.png
17ad0a56.png
436a6348.png
4c0fcf57.png
4fed5061.png
6f7008af.png
76093afc.png
81089cb4.png
865336ed.png
a3111116.png
a4318ac9.png
b10c49ca.png
bace1e45.png
bb4c4038.png
cb706009.png

Thank you so much for making this resource available!

CR vs DX

For modality what is the difference here?

Radiological findings

While I appreciate your great work providing images in a public repo, not having labels for images is a problem in my point of view. I have read your guideline for assuming COVID-positive/negative based on admission-offset but some of the CXRs are not clearly having radiological findings of pneumonia.

For example, CXR with patient_id=d3fb252e and image_id=88859dc1 has an admission offset of 0 with most of the other metrics not reported; while it is assumed COVID-19 positive, I think CXR does not show specific findings related to pneumonia. If a radiologist's interpretation could be provided for the images it would be great.

date offset calculation

There is an internal issue with the date offset calculation. We are working on a fix and will update shortly.

admission_offset

I don't understand this field. It is the days since symptoms/admission?

For example patient 09ec853d with an admission_offset of -21 and an icu_admission_offset -20.0 ? So from this image they went to the ICU in 20 days and then were admitted to the hospital a day after?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.