coyotespike / covid19-images Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 4.0 19 KB

A large collection of COVID-19 radiology imaging datasets for use in machine learning

License: The Unlicense

covid19-images's Introduction

👋 Hi, I’m @coyotespike
📫 You can reach me via Twitter at the same name

covid19-images's People

Contributors

Stargazers

Watchers

Forkers

imagelover drsxr yjsyyyjszf amirunpri2018

covid19-images's Issues

Create issues to add all sources of data

https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia
Combine with https://stanfordmlgroup.github.io/competitions/chexpert/
And with https://nihcc.app.box.com/v/DeepLesion
Maybe https://www.kaggle.com/nih-chest-xrays/data
Can't tell what body part these CT scans are: https://www.kaggle.com/kmader/siim-medical-images
Are these in ieee8023? https://www.sirm.org/category/senza-categoria/covid-19/
Seem to be more files?? https://drive.google.com/drive/folders/15OetZibzK2GvRmo8Ga3NnhjX0RJSk9v5
https://docs.google.com/spreadsheets/d/1MULRCsmZT4zVbv_myAcsiqZP42jMkb9D6vc97S-f68M/edit#gid=0

Consider adding mimic-cxr-jpg to a submodule

https://arxiv.org/abs/1901.07042

You need to be a credentialed user and also agree not to share the data.

https://www.physionet.org/content/mimic-cxr-jpg/2.0.0/

Add Zhang Lab data as submodule

This has not been hosted independently.

It is included in https://www.kaggle.com/darshan1504/covid19-detection-xray-dataset and in https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia.

Adding it here duplicates work, but those datasets have already processed it and thereby added assumptions.

https://data.mendeley.com/datasets/rscbjbr9sj/3

Make `git submodule update` run daily

See issue #3, as this project grows we shouldn't have to run this daily. Anytime a submodule updates, we should get the latest data.

Add AHP Paris CT scans data

https://drive.google.com/drive/folders/1rmOsj1PrjKV-aBv_KVNLpdiN0DWPuRE0

Same as previous issue

Add yml file for issue states

https://github.com/dessant/issue-states

Moving back to any column except Done should re-open the issue

Investigate if submodule can track head

The forked subrepo (for lack of a better word) syncs automatically when its upstream updates.

Right now the submodule does not update. It points to a fixed hash.

Can we make it update when its master reflects upstream changes? Or do we need to have a script run to commit and update submodule changes?

Add covid-chestxray-dataset as subrepo

Add script to load CT scan data for FastAI

It's possible that this will be too large a job for a script.

The dataset will need to be in folders that FastAI can work with.

If necessary, create a new submodule that copies the data into the correct folders.

Consider adding CheXpert data to its own repo

https://stanfordmlgroup.github.io/competitions/chexpert/

"Once you register to download the CheXpert dataset, you will receive a link to the download over email. Note that you may not share the link to download the dataset with others."

This isn't really the time to be possessive

Check back for new sources of data

https://docs.google.com/spreadsheets/d/1MULRCsmZT4zVbv_myAcsiqZP42jMkb9D6vc97S-f68M/edit#gid=0

Volunteers are collecting data from original sources. Better to wait for that than to go through the papers ourselves. Can get a lot more data that way.

Check back say April 1

Add proper signup link for Arterys Slack

I found it maybe in the CoronaVirus Tech Handbook? Else CovidBase.

Create Zhang Lab data in own repo

Investigate accessing NIH chest x-rays

https://www.kaggle.com/nih-chest-xrays/data

There's 42 gb!

How can we access it in our scripts for users? Or get on github?

Make covid-chestxray-dataset sync upstream automatically

Add COVID-19-Confirmed-cases

https://drive.google.com/drive/folders/1XyhZtTyU00nyLyydWmnfzTG74u5KH3Vh

These are just hosted on google drive, needs to live in its own repo, and repo then becomes a submodule here.

Put project in relevant places

CovidBase
http://open-source-covid-19.weileizeng.com/
Other spreadsheets on Tech Handbook

Consider contacting other organizations that can help

Investigate de-duplication

Many datasets will not have the possibility of duplicate images. But with many researchers independently assembling datasets, eventually this is likely.

We can either:

Only work with fundamental datasets, not any aggregators
Attempt to de-duplicate images

Let's try to stick to the first option. Any dataset should show its sources. Eventually we will likely hit a dataset where we can't drill down. At that point we will need to assess how to de-duplicate and what compute resources this will require.

Add DeepLesion data

Either a script to load, or unzip it and put it into its own repo, then include the repo as a submodule here.

https://nihcc.app.box.com/v/DeepLesion

Convert JPEG images to DICOM

Several submodules contain JPEG images.

The submodule should probably contain folders for converted and unconverted images. It should also hold the code which converted the JPEG to DICOM.

This is because DICOM images have more dimensions than other formats and those dimensions have to be filled in somehow. The assumptions should be documented.

coyotespike / covid19-images Goto Github PK

covid19-images's Introduction

covid19-images's People

Contributors

Stargazers

Watchers

Forkers

covid19-images's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs