GithubHelp home page GithubHelp logo

j-seo / kocommongen-v2 Goto Github PK

View Code? Open in Web Editor NEW
23.0 4.0 1.0 1.03 MB

KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models

Python 99.60% Shell 0.40%

kocommongen-v2's Introduction

🌠 KoCommonGEN v2

KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models [ACL 2024-Findings]

Jaehyung Seo, Jaewook Lee, Chanjun Park, SeongTae Hong, Seungjun Lee and Heuiseok Lim

🏫 NLP & AI Lab, Korea University


πŸ”₯ News

  • September 27, 2023: Provided data support for the Open Ko-LLM Leaderboard
  • August 7, 2024: Dataset Release
  • August 10, 2024: Experimental Results for the New Models Added
  • August 14, 2024: Presented a research paper at ACL 2024

πŸ“Š Dataset

The KoCommonGEN v2 dataset is available on Hugging Face:

You can easily access and use these datasets for your research and experiments.

πŸ› οΈ Installation

This repository partially adopts the evaluation methods of version 0.3.0 of EleutherAI/lm-eval-harness for the evaluation of KoCommonGEN v2

$ git clone https://github.com/J-Seo/KoCommonGEN-V2.git
# python_requires >=3.9
$ cd KoCommonGEN_v2
$ pip install -r requirements.txt 

πŸš€ Usage

The maximum number of few-shot examples currently uploaded is 5. Users can freely add more to increase --num_fewshot

$ sh test.sh
## test.sh
python3 main.py \ 
--model hf-causal-experimental \
--model_args pretrained="nlpai-lab/KULLM3" \
--task ko_commongen_v2 \
--device cuda:1 \
--num_fewshot 2 \
--batch_size 1 \
--output nlpai-lab/KULLM3 &

You can also use sequence-to-sequence models.

## test.sh
python3 main.py \
--model hf-seq2seq \
--model_args pretrained="google/flan-t5-xxl" \
--task ko_commongen_v2 \
--device cuda:1 \
--num_fewshot 2 \
--batch_size 1 \
--output google/flan-t5-xxl &

πŸ‘₯ Human Evaluation

We recruited 22 native Korean speaking volunteers as human evaluators and paid them $0.8 per question.

Model # Average Score cohen's kappa Krippendorff's alpha
Human 22 0.8395 0.7693 0.7706

πŸ€– Models (August 10, 2024)

The results of 2-shot evaluation of the newly released models.

Model Size Acc_norm Stderr Link
GPT-4 (June 13, 2023) 0.7450
Mistral-Nemo-Instruct 12B 0.6612 0.0163 πŸ”—
Mistral-Nemo-Base 12B 0.6340 0.0166 πŸ”—
Meta-Llama-3.1-8B 8B 0.6246 0.0166 πŸ”—
QWEN2-7B base 7B 0.6187 0.0167 πŸ”—
EXAONE-3.0-7.8B-Instruct 7.8B 0.6088 0.0168 πŸ”—
MLP-KTLim-Bllossom-8B 8B 0.6057 0.0168 πŸ”—
Meta-Llama-3.1-8B-Instruct 8B 0.6057 0.0168 πŸ”—
KULLM3 10.8B 0.6033 0.0168 πŸ”—
QWEN2-7B inst 7B 0.5832 0.017 πŸ”—
Gemma-2-9b-it 9B 0.5714 0.0170 πŸ”—
Aya-23-8B 8B 0.5159 0.0172 πŸ”—
Allganize-Alpha-Instruct 8B 0.4970 0.0172 πŸ”—

As mentioned in the paper, it is possible to evaluate various models.

πŸ‡°πŸ‡·πŸ‡ΊπŸ‡ΈπŸ‡―πŸ‡΅πŸ‡¨πŸ‡³πŸ‡ͺπŸ‡Έ Code-switching

The multilingual dataset consists of 99 samples for numerical commonsense reasoning, which were created relying on machine translation.

The dataset can be found at the following path: lm_eval/datasets/ko_commongen_v2/shuffled_$LANG$_1.0.jsonl.

You can also access the code-switching dataset on Hugging Face: nlpai-lab/ko_commongen_v2_code_switching

(The code-switching data relies on machine translation, which may result in some inaccuracies.)

If you intend to use it for evaluation, you should modify the prompt and file path in lm_eval/tasks/ko_commongen_v2.py.

πŸ“– Citation

@inproceedings{seo2024Kocommongenv2,
    title = "KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models",
    author = "Jaehyung Seo and Jaewook Lee and Chanjun Park and SeongTae Hong and Seungjun Lee and Heuiseok Lim",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
    month = August,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "TBD",
    doi = "TBD",
    pages = "TBD"}

🚨 Warning!

This dataset contains some instances of toxic speech.

πŸ™ Acknowledgement

We sincerely appreciate the dedication of Chanjun Park, Sanghoon Kim and Sunghun Kim (Sung Kim) from Upstage AI in managing one of the benchmark datasets for the Open Ko-LLM LeaderBoard.

kocommongen-v2's People

Contributors

j-seo avatar seongtaehong avatar jooinjang avatar parkchanjun avatar metterian avatar

Stargazers

Kang Won Byun avatar Unchun Yang avatar Heegyu Kim avatar Jihoon Lee avatar Yohan Na avatar  avatar YONGSANG avatar Seongmin Park avatar Wiseman Lim avatar Myungchul Shin avatar Suzie Oh avatar Kyuhong Byun (λ³€κ·œν™ / combacsa) avatar  avatar Junyoung Son avatar Sigrid Jin (ΰΈ‡'Μ€-'́)ΰΈ‡ oO avatar Michael Y. Choi avatar  avatar Jeongwook Kim avatar  avatar  avatar YoungJoon Jang avatar Jungseob Lee avatar Jun Kim avatar

Watchers

 avatar  avatar Kostas Georgiou avatar  avatar

Forkers

parkchanjun

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.