GithubHelp home page GithubHelp logo

semacandemir / bimcv-covid-19 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bimcv-csusp/bimcv-covid-19

0.0 0.0 0.0 54.29 MB

Valencia Region Image Bank (BIMCV) that combines data from the PadChest dataset with future datasets based on COVID-19 pathology to provide the open scientific community with data of clinical-scientific value that helps early detection of COVID-19

License: MIT License

Jupyter Notebook 34.62% Python 0.35% HTML 65.03%

bimcv-covid-19's Introduction

BIMCV COVID19 iterations 1 + 2 Dataset is ready

Thank you for your Interest in the BIMCV-COVID19 iterations Dataset.

Please read distribution rights at LICENSE.md.

Description

BIMCV-COVID19+ dataset is a large dataset with chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19 patients along with their radiographic findings, pathologies, polymerase chain reaction (PCR), immunoglobulin G (IgG) and immunoglobulin M (IgM) diagnostic antibody tests and radiographic reports from Medical Imaging Databank in Valencian Region Medical Image Bank (BIMCV). The findings are mapped onto standard Unified Medical Language System (UMLS) terminology and they cover a wide spectrum of thoracic entities, contrasting with the much more reduced number of entities annotated in previous datasets. Images are stored in high resolution and entities are localized with anatomical labels in a Medical Imaging Data Structure (MIDS) format. In addition, 23 images were annotated by a team of expert radiologists to include semantic segmentation of radiographic findings. Moreover, extensive information is provided, including the patient’s demographic information, type of projection and acquisition parameters for the imaging study, among others. These iterations of the database include 7377 CR, 9463 DX and 6687 CT studies.

Data Sources

This directory contains an anonymized dataset of torax Rx from COVID19 patients, prepared by the same authors as PADCHEST dataset and described in the following preprint arXiv:2006.01174.

BIMCV-COVID19+ 1st+2nd iteration

The Padchest-pneumonia dataset, here

FYI, the content on BIMCV COVID-19 github space is subject to daily updates. Note: please do not claim diagnostic performance of a model without a clinical study! This is not a kaggle competition dataset.

Following common strategies and initiatives emerged from the scientific community at international level, a series of actions are being carried out within the Valencia Region Image Bank (BIMCV) that combines data from the PadChest dataset with future datasets based on COVID-19 pathology to provide the open scientific community with data of clinical-scientific value that helps early detection of COVID-19.

The team that is working on this project is made up of: FISABIO, Miguel Hernandez University, University of Alicante and staff from Hospital San Juan de Alicante with the colabortion of MedBravo, GE and CIPF.

Our thanks to multiple teams that are providing relevant information and help to improve procedures. Among them we can highlight the PRHLT, ITI, IFIC-CSIC, HGV, BSC, Transbionet network and BioinformaticsAnd_AI.

Medical Imaging Example of COVID-19 Rx from medRxiv preprint doi ChestRX-COVID

Goal

Collect and publish chest X-ray images, coming from hospitals affiliated to the BIMCV, to which data that allows their identification will be erased for the purpose of training Deep Learning (DL) models. Such training is meant to obtain an early detection of infection and pneumonia by Covid from a simple chest X-ray. In order to achieve this, these images will be structured into subgroups of images coming from the PadChest dataset with differential diagnosis related to COVID-19’s radiological semiology, allowing to develop the first models based on Artificial Intelligence to better predict and understand the infection.

Immediate actions to do

Our investigation group “Unidad Mixta de Imagen Biomédica FISABIO-CIPF” is working towards launching these models using the FISABIO’s BIMCV (Openmind) platform that is shared with the computational resources from CIPF. It is a series of shared computational resources to face these challenges through the TransBioNet net. While waiting for official authorization for an extraction of a new COVID-19’s X-ray data set from the competent authorities, the next tasks/actions are being carried out, using PadChest images as basis:

  • Reorganization of PadChest data set related to COVID-19’s pathology course.
  • Extraction or data organization into subgroups coming from PadChest (known in AI as Data Curation) starting with pneumonia, infiltrated and controls.
  • Effective and well-adjusted partitioning (see figure 1).
  • Preprocessing. Basically, the images are gonna be stored both in cluster and Kaggle and distributed into three groups: training or (Tr) 60%, validation or (Val) 20% and test or (Te) 20%.

Various models will be trained and the ones which obtain better accuracy will be available as open source code in order to when the new BIMCV-COVID-19 data set is acquired, enable Transfer-Learning in new computational models.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.