GithubHelp home page GithubHelp logo

ycchen-tw / neurips_llm_efficiency_challenge Goto Github PK

View Code? Open in Web Editor NEW

This project forked from llm-efficiency-challenge/neurips_llm_efficiency_challenge

0.0 0.0 0.0 23.54 MB

NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day

Python 58.87% Dockerfile 0.21% Jupyter Notebook 2.60% C++ 16.16% Cuda 22.05% C 0.10%

neurips_llm_efficiency_challenge's Introduction

Important

The latest training code and Readme are in the "training_code_v2" directory.

NeurIPS LLM Efficiency Challenge Solution

This repository presents the training and inference code developed by team ycchen.

We participated in the RTX4090 track.

We are graduate students at National Taiwan University.

Method Introduction

Our method is described as follows:

Our three submissions are:

  1. 4090_submissions_1.zip: Pure Qwen-14B quantized model, without fine-tuning.

  2. 4090_submissions_2.zip: QLoRA instruction tuning using Open Assistant and LIMA.

  3. 4090_submissions_3.zip: QLoRA instruction tuning using data from Open Assistant, LIMA, ARC and C4 based Humpback dataset (Upon checking, we found that our training code actually did not load the Humpback results ๐Ÿ˜‚).

Results:

  1. 4090_submissions_1.zip: Score 0.6458

  2. 4090_submissions_2.zip: Score 0.5845

  3. 4090_submissions_3.zip: Score 0.5954

Interestingly, our best-performing model was the one that was only quantized but not fine-tuned, Qwen-14B. This was an unexpected outcome ๐Ÿ˜ข.

However, our respectable ranking was likely due to the use of GPTQ for higher quality quantization, and we precisely adjusted the generation settings (such as minimum, maximum token length, and stop criteria) to ensure the model produced outputs that met the challenge requirements.

Training Code

Considering the above results, our training code essentially only involves the quantization code of Qwen-14B.

The training code is available in the folder named 'training_code'. It utilizes auto-gptq to quantize Qwen-14B and uploads it to the Hugging Face Hub. To start, replace "YOUR_TOKEN" and "YOUR_USERNAME/YOUR_REPO" in the Dockerfile with your personal token and repository details. Execute the following commands:

docker build -f ./Dockerfile -t qwen_quant .

docker run --gpus "device=0" --rm -ti qwen_quant

The program can complete in approximately 2 hours using a single RTX 3090 GPU.

Additionally, upon request, we also provide the training code for submission 2 and submission 3. They are located in the folders 'training_code_submission_2' and 'training_code_submission_3' respectively. The 'gen_dataset.ipynb' is for organizing the training dataset, and 'qlora.ipynb' is for training the model.

Inference Code

After completing Qwen quantization, please replace MODEL_PATH in main.py of 4090_submissions_1.zip with "YOUR_USERNAME/YOUR_REPO" (originally "ycchen/yc-test1"). LORA_PATH can be ignored, because it does not actually participate in the subsequent program.

Then execute the Dockerfile in 4090_submissions_1.zip.

Data Format

The submissions for the 4090 challenge are contained within the '4090_submissions' folder, which includes the following files:

  • 4090_submissions/

    • 4090_submissions_1.zip

    • 4090_submissions_2.zip

    • 4090_submissions_3.zip

NeurIPS 1 LLM 1 GPU Challenge


neurips_llm_efficiency_challenge's People

Contributors

aniketmaurya avatar carmocca avatar drisspg avatar mreso avatar msaroufim avatar perlitz avatar pietrolesci avatar rasbt avatar riaz avatar shushengyuan avatar weiweiy avatar xindi-dumbledore avatar ycchen-tw avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.