GithubHelp home page GithubHelp logo

abachaa / existing-medical-qa-datasets Goto Github PK

View Code? Open in Web Editor NEW
212.0 12.0 29.0 27 KB

Multimodal Question Answering in the Medical Domain: A summary of Existing Datasets and Systems

medical-qa-datasets qa question-answering nlp bionlp medical-qa medical-informatics vqa computer-vision datasets

existing-medical-qa-datasets's Introduction

Existing Medical QA & VQA Datasets

Multimodal Question Answering (QA) in the Medical Domain: A summary of Existing Datasets and Systems

*** Two Main Tasks: Medical Question Answering (QA) & Visual Question Answering (VQA) ***

I) Medical QA Datasets:

  1. Corpus for Evidence Based Medicine Summarization (Mollá, 2010): https://sourceforge.net/projects/ebmsumcorpus
  2. CLEF QA4MRE Alzheimer’s task (Peñas et al, 2012).
  3. BioASK datasets (2012-2020): http://bioasq.org/participate/challenges
  4. TREC LiveQA-Med (Ben Abacha et al, 2017): https://github.com/abachaa/LiveQA_MedicalTask_TREC2017
  5. MEDIQA-2019 datasets on NLI, RQE, and QA (Ben Abacha et al., 2019): https://github.com/abachaa/MEDIQA2019
  6. MEDIQA-AnS dataset of question-driven summaries of answers (Savery et al., 2020): https://osf.io/fyg46/ Paper: https://www.nature.com/articles/s41597-020-00667-z
  7. MedQuaD Collection of 47k QA pairs (Ben Abacha and Demner-Fushman, 2019): https://github.com/abachaa/MedQuAD
  8. Medication QA Collection (Ben Abacha et al., 2019): https://github.com/abachaa/Medication_QA_MedInfo2019
  9. Consumer Health Question Summarization (Ben Abacha and Demner-Fushman, 2019): https://github.com/abachaa/MeQSum
  10. emrQA: QA on Electronic Medical Records (Pampari et al., 2018). Scripts to generate emrQA from i2b2 data: https://github.com/panushri25/emrQA
  11. EPIC-QA dataset on COVID-19 (Goodwin et al., 2020): https://bionlp.nlm.nih.gov/epic_qa/
  12. BiQA Corpus (Lamurias et al., 2020): https://github.com/lasigeBioTM/BiQA Paper:https://ieeexplore.ieee.org/document/9184044
  13. HealthQA Dataset (Zhu et al., 2019): https://github.com/mingzhu0527/HAR Paper: https://dmkd.cs.vt.edu/papers/WWW19.pdf
  14. MASH-QA Dataset on Multiple Answer Spans Healthcare Question Answering, with 35k QA pairs (Zhu et al., 2020): https://github.com/mingzhu0527/MASHQA Paper: https://www.aclweb.org/anthology/2020.findings-emnlp.342.pdf
  15. MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. (Pal et al., CHIL, PMLR 2022): https://github.com/medmcqa/medmcqa Paper: https://proceedings.mlr.press/v174/pal22a.html

II) Medical VQA Datasets (Radiology):

  1. VQA-RAD (Lau et al. 2018): https://osf.io/89kps
  2. VQA-Med 2018 (Hasan et al. 2018): https://www.aicrowd.com/challenges/imageclef-2018-vqa-med
  3. VQA-Med 2019 (Ben Abacha et al. 2019): https://github.com/abachaa/VQA-Med-2019
  4. VQA-Med 2020 (Ben Abacha et al. 2020): https://github.com/abachaa/VQA-Med-2020

III) Online QA Systems:

-- I searched and tested several systems (e.g. AskHERMES, MiPACQ, SimQ). This list includes only the systems that are still maintained.

  1. CHiQA (Consumer Health Question Answering System): chiqa.nlm.nih.gov
  2. Neural Covidex: covidex.ai

IV) Medical Datasets Relevant to Question Answering:

  1. i2b2 shared tasks (2006-2016): www.i2b2.org/NLP
  2. n2c2 NLP clinical challenges (2018-2019): https://n2c2.dbmi.hms.harvard.edu https://dbmi.hms.harvard.edu/programs/national-nlp-clinical-challenges-n2c2
  3. TREC Medical Records Track (2012-2013).
  4. TREC Clinical Decision Support Track (2014-2016): http://www.trec-cds.org
  5. TREC Precision Medicine Track (2017-2019): http://www.trec-cds.org
  6. CLEF eHealth (2013-2020): https://clefehealth.imag.fr
  7. COVID dataset (CORD-19): https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

V) Medical Datasets Relevant to VQA:

  1. ImageCLEF Medical Automatic Image Annotation (2008-2009): https://www.imageclef.org/2008/medaat and https://www.imageclef.org/2009/medanno
  2. ImageCLEF Medical User-oriented Image Retrieval Task (2011): https://www.imageclef.org/2011/medicaluseroriented
  3. ImageCLEF Medical Retrieval Task (2008-2012): https://www.imageclef.org/2012/medical
  4. ImageCLEF AMIA: Medical task (2013): https://www.imageclef.org/2013/medical
  5. ImageCLEFmed: Medical classification (2015): https://www.imageclef.org/2015/medical
  6. ImageCLEF Medical Clustering (2015): https://www.imageclef.org/2015/clustering
  7. ImageCLEFmed (2016): https://www.imageclef.org/2016/medical
  8. ImageCLEFcaption (2017-2020): https://www.imageclef.org/2017/caption
  9. ImageCLEFmedical tasks (2019-2020): https://www.imageclef.org/2019/medical and https://www.imageclef.org/2020/medical
  10. MIMIC-CXR Database (2019): https://physionet.org/content/mimic-cxr/2.0.0/

Contact

-  Asma Ben abacha (abenabacha at microsoft dot com)

existing-medical-qa-datasets's People

Contributors

abachaa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.