GithubHelp home page GithubHelp logo

devichand579 / hpt Goto Github PK

View Code? Open in Web Editor NEW
7.0 3.0 0.0 1.88 MB

code for Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models

License: Apache License 2.0

Python 99.96% Shell 0.04%

hpt's Introduction

Logo

Hierarchical Prompting Taxonomy

A Universal Evaluation Framework for Large Language Models
Table of Contents
  1. News
  2. Introduction
  3. Demo
  4. Installation
  5. Usage
  6. Datasets and Models
  7. Benchmark Results
  8. References
  9. Contributing
  10. Cite Us

News

[06-18-24] HPT is published ! Check out the paper here.

↑ Back to Top ↑

Introduction

Hierarchical Prompting Taxonomy (HPT) is a universal evaluation framework for large language models. It is designed to evaluate the performance of large language models on a variety of tasks and datasets assigning HP-Score for each dataset relative to different models. The HPT employs Hierarchical Prompt Framework (HPF) which supports a wide range of tasks, including question-answering, reasoning, translation, and summarization. It provides a set of pre-defined prompting strategies tailored for each task based on its complexity. Refer to paper at : https://arxiv.org/abs/2406.12644

HPT

Features of HPT

  • Universal Evaluation Framework: HPT is a universal evaluation framework that can support a wide range of datasets and LLMs.
  • Hierarchical Prompt Framework: HPF is a set of prompting strategies tailored for each task based on its complexity employed by the HPT. HPF is made available in two modes: manual and adaptive. Adaptive HPF selects the best prompting strategy for a given task adaptively by a LLM (prompt-selector).
  • HP-Score: HPT assigns an HP-Score for each dataset relative to different agents(including LLMs and humans). HP-Score is a measure of the capability of an agent to perform a task related to a dataset. Lower HP-Score indicates better performance over the dataset.

↑ Back to Top ↑

Demo

Refer to examples directory for using the framework on different datasets and models.

↑ Back to Top ↑

Installation

Cloning the Repository

To clone the repository, run the following command:

git clone https://github.com/devichand579/HPT.git

↑ Back to Top ↑

Usage

Linux

To get started on a Linux setup, follow these setup commands:

  1. Activate your conda environment:

    conda activate hpt
  2. Navigate to the main codebase

    cd HPT/hierarchical_prompt
  3. Install the dependencies

    pip install -r requirements.txt
  4. Add your Hugging Face token

    • Create a .env file in the conda environment
    HF_TOKEN = "your HF Token"
  5. To run both frameworks, use the following command structure

    bash run.sh method model dataset [--thres num]
    • method

      • man
      • auto
    • model

      • llama3
      • phi3
      • gemma
      • mistral
    • dataset

      • boolq
      • csqa
      • iwslt
      • samsum
    • If the datasets are IWSLT or SamSum, add '--thres num'

    • num

      • 0.15
      • 0.20
      • or higher thresholds apart from our experiments.
    • Example commands:

      bash run.sh man llama3 iwslt --thres 0.15
      bash run.sh auto phi3 boolq 

↑ Back to Top ↑

Datasets and models

HPT currently supports different datasets, models and prompt engineering methods employed by HPF. You are welcome to add more.

Datasets

  • Question-answering datasets:
    • BoolQ
  • Reasoning datasets:
    • CommonsenseQA
  • Translation datasets:
    • IWSLT-2017 en-fr
  • Summarization datasets:
    • SamSum

Models

  • Language models:
    • Llama 3 8B
    • Mistral 7B
    • Phi 3 3.8B
    • Gemma 7B

Prompt Engineering

  • Role Prompting [1]
  • Zero-shot Chain-of-Thought Prompting [2]
  • Three-shot Chain-of-Thought Prompting [3]
  • Least-to-Most Prompting [4]
  • Generated Knowledge Prompting [5]

↑ Back to Top ↑

Benchmark Results

The benchmark results for different datasets and models are available in the leaderboad.

↑ Back to Top ↑

References

  1. Kong, A., Zhao, S., Chen, H., Li, Q., Qin, Y., Sun, R., & Zhou, X. (2023). Better Zero-Shot Reasoning with Role-Play Prompting. ArXiv, abs/2308.07702.
  2. Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large Language Models are Zero-Shot Reasoners. ArXiv, abs/2205.11916.
  3. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E.H., Xia, F., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. ArXiv, abs/2201.11903.
  4. Zhou, D., Scharli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Bousquet, O., Le, Q., & Chi, E.H. (2022). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. ArXiv, abs/2205.10625.
  5. Liu, J., Liu, A., Lu, X., Welleck, S., West, P., Le Bras, R., Choi, Y., & Hajishirzi, H. (2021). Generated Knowledge Prompting for Commonsense Reasoning. Annual Meeting of the Association for Computational Linguistics.

↑ Back to Top ↑

Contributing

This project aims to build open-source evaluation frameworks for assessing LLMs and other agents. This project welcomes contributions and suggestions. Please see the details on how to contribute.

If you are new to GitHub, here is a detailed guide on getting involved with development on GitHub.

↑ Back to Top ↑

Cite Us

If you find our work useful, please cite us !

@misc{budagam2024hierarchical,
      title={Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models}, 
      author={Devichand Budagam and Sankalp KJ and Ashutosh Kumar and Vinija Jain and Aman Chadha},
      year={2024},
      eprint={2406.12644},
      archivePrefix={arXiv},
      primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}

↑ Back to Top ↑

hpt's People

Contributors

ashu1069 avatar devichand579 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.