GithubHelp home page GithubHelp logo

iorikyo79 / ehrnoteqa Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ji-youn-kim/ehrnoteqa

0.0 0.0 0.0 785 KB

Official code for "EHRNoteQA: A Patient-Specific Question Answering Benchmark for Evaluating Large Language Models in Clinical Settings"

License: MIT License

Shell 10.23% Python 89.77%

ehrnoteqa's Introduction

EHRNoteQA: A Patient-Specific Question Answering Benchmark for Evaluating Large Language Models in Clinical Settings

Overview

This study introduces EHRNoteQA, a novel patient-specific question answering benchmark tailored for evaluating Large Language Models (LLMs) in clinical environments. Based on MIMIC-IV Electronic Health Record (EHR), a team of three medical professionals has curated the dataset comprising 962 unique questions, each linked to a specific patient's EHR clinical notes. What makes EHRNoteQA distinct from existing EHR-based benchmarks is as follows: Firstly, it is the first dataset to adopt a multi-choice question answering format, a design choice that effectively evaluates LLMs with reliable scores in the context of automatic evaluation, compared to other formats. Secondly, it requires an analysis of multiple clinical notes to answer a single question, reflecting the complex nature of real-world clinical decision-making where clinicians review extensive records of patient histories. Our comprehensive evaluation on various large language models showed that their scores on EHRNoteQA correlate more closely with their performance in addressing real-world medical questions evaluated by clinicians than their scores from other LLM benchmarks. This underscores the significance of EHRNoteQA in evaluating LLMs for medical applications and highlights its crucial role in facilitating the integration of LLMs into healthcare systems. The dataset will be made available to the public under PhysioNet credential access, promoting further research in this vital field.

Requirements

# Create the conda environment
conda create -y -n EHRNoteQA python=3.10.4

# Activate the environment
source activate EHRNoteQA

# Install required packages
conda install -y pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

pip install pandas==1.4.3 \
            openai==1.12.0 \
            transformers==4.31.0 \
            accelerate==0.21.0 \
            tqdm==4.65.0 \
            fire==0.5.0

Dataset

The EHRNoteQA dataset is accessible on Physionet with your Physionet credentials.
To get started, download the following:

  1. The EHRNoteQA dataset.
  2. The discharge.csv.gz file from MIMIC-IV-Note v2.2.


Overview of EHRNoteQA Data Generation Pipeline

Preparing the Data

You may preprocess the inputs as below. (located in scripts/preprocess.sh)

python ../src/preprocessing/preprocess.py \
--ehrnoteqa [Folder path containing 'EHRNoteQA.jsonl' file] \
--mimiciv [Folder path containing 'discharge.csv.gz' file from MIMIC-IV database] \
--output [Folder path to save the preprocessed EHRNoteQA dataset]

This script will link MIMIC-IV notes with the EHRNoteQA dataset and preprocess the MIMIC-IV notes by removing unnecessary whitespace and adding relevant metadata.

Generating Model Outputs

If you are generating outputs using GPT, set the environment variables as below.
You must use HIPAA-compliant platforms such as Azure.

export AZURE_OPENAI_ENDPOINT=[Your Endpoint]
export AZURE_OPENAI_KEY=[Your Key]
export AZURE_API_VERSION=[Your API Version]

You may generate model outputs of the EHRNoteQA dataset as below. (located in scripts/generate.sh)

# Using Local Model
CUDA_VISIBLE_DEVICES=0 python ../src/generation/generate.py \
--ckpt_dir [Folder path to target model for evaluation] \
--model_name Llama-2-7b-chat-hf \
--input_path [Folder path to the processed EHRNoteQA data] \
--file_name EHRNoteQA_processed.jsonl \
--save_path [Folder path to save target model generated output]

# Using GPT
python ../src/generation/generate.py \
--ckpt_dir "" \
--model_name gpt-4-1106-preview \
--input_path [Folder path to the processed EHRNoteQA data] \
--file_name EHRNoteQA_processed.jsonl \
--save_path [Folder path to save target model generated output]

Adjust the model and file names according to the target evaluation model and the processed EHRNoteQA file.

Model Evaluation

Set the environment variables as previously described if not already done.
You must use HIPAA-compliant platforms such as Azure.

export AZURE_OPENAI_ENDPOINT=[Your Endpoint]
export AZURE_OPENAI_KEY=[Your Key]
export AZURE_API_VERSION=[Your API Version]

Use the following script to evaluate the model outputs generated from the EHRNoteQA dataset. (located in scripts/evaluate.sh)

python ../src/evaluation/evaluate.py \
--gpt_type gpt-4-1106-preview \
--model_name Llama-2-7b-chat-hf \
--input_path [Folder path to target model generated output] \
--file_name ours_Llama-2-7b-chat-hf_EHRNoteQA_processed.csv \
--save_path [Folder path to save GPT evaluated result of target model output]

Adjust the names for the GPT model used for evaluation (gpt type), the model whose output to be evaluated (model name), and the generated model output file (file name).

Citation

@misc{kweon2024ehrnoteqa,
      title={EHRNoteQA: A Patient-Specific Question Answering Benchmark for Evaluating Large Language Models in Clinical Settings}, 
      author={Sunjun Kweon and Jiyoun Kim and Heeyoung Kwak and Dongchul Cha and Hangyul Yoon and Kwanghyun Kim and Seunghyun Won and Edward Choi},
      year={2024},
      eprint={2402.16040},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Code References

https://github.com/lm-sys/FastChat

ehrnoteqa's People

Contributors

ji-youn-kim avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.