GithubHelp home page GithubHelp logo

sidjha1 / zero_scrolls Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tau-nlp/zero_scrolls

0.0 0.0 0.0 38 KB

Running inference on the ZeroSCROLLS benchmark

License: MIT License

Python 82.08% TeX 17.92%

zero_scrolls's Introduction

ZeroSCROLLS

This repository contains code to run inference on the ZeroSCROLLS benchmark.

Setup

  • Install torch
  • Install transformers 4.30.2
  • pip install -r requirements.txt

Load the data

from datasets import load_dataset

gov_report = load_dataset("tau/zero_scrolls", "gov_report", split="test")
"""
Options are: ["gov_report", "summ_screen_fd", "qmsum", "squality", "qasper","narrative_qa", "quality", "musique", "space_digest","book_sum_sort"]
There is also a small number of examples (~20 per task) in a "validation" split, meant for eyeballing purposes
"""

Inference with Huggingface models

python experiments/hf/run_hf_model.py --model-name=google/flan-t5-small

Supported models:

  • google/flan-t5-small
  • google/flan-t5-base
  • google/flan-t5-large
  • google/flan-t5-xl
  • google/flan-t5-xxl
  • google/flan-ul2
  • bigscience/T0pp

To add new models:

  • Add them to model_to_max_input_tokens in experiments/hf/run_hf_model.py
  • Make sure to load them with the appropriate architecture (i.e. modify the model initialization from T5ForConditionalGeneration in the same file, if needed)

Inference with APIs

To run with models used in the paper*:

# if you want to use openai models
export OPENAI_API_KEY=<insert token here> 
export OPENAI_ORG=<insert org here>

# if you want to use anthropic models
export ANTHROPIC_API_KEY=<insert token here>

# if you want to limit the number of examples to run per task
export MAX_EXAMPLES=10

python experiments/api/run_api_model.py --model_name=gpt-3.5-turbo --limit_to_n_examples=$MAX_EXAMPLES

*These models and APIs tend to update, see the paper for the versions used in the baselines.

Models supported:

  • text-davinci-003
  • gpt-3.5-turbo
  • gpt-4
  • claude-v1

To add new a new API, you need to:

When using a prompt that includes opening XML tags, (e.g. "... Assistant: <answer>"), ensure that you post-process the generations to retain only the prefix before the closing XML tag generated by the model before submitting.

Prepare submission

To create a CSV file in the correct format for a leaderboard submission we recommend using our conversion script, prepare_submission.py.

Its inputs:

For each task, the predictions should be in a JSON file that is a mapping from an ID to a textual prediction:

{
    "example_id1": "prediction1",
    "example_id2": "prediction2",
    ...
}

Please set:

  • {dataset_name}_PREDS_FILE to be the path to a JSON file in the format above containing your predictions for {dataset_name}.
  • OUTPUT_DIR to be the path you want the submission file will be saved to.

Run:

python submission/prepare_submission.py \
--gov_report_file GOV_REPORT_PREDS_FILE \
--summ_screen_file SUMM_SCREEN_FD_PREDS_FILE \
--qmsum_file QMSUM_PREDS_FILE \
--squality_file SQUALITY_PREDS_FILE \
--qasper_file QASPER_PREDS_FILE \
--narrative_qa_file NARRATIVE_QA_PREDS_FILE \
--quality_file QUALITY_PREDS_FILE \
--musique_file MUSIQUE_PREDS_FILE \
--space_digest_file SPACE_DIGEST_PREDS_FILE \
--book_sum_sort_file BOOK_SUM_SORT_PREDS_FILE \
--output_dir OUTPUT_DIR

Verify your submission file

Run:

python submission/verify_submission.py \
--all_predictions SUBMMISION_FILE \
--output_dir OUTPUT_DIR

A valid submission file will result in the following line printed:

The verification was successful.

Please fix any errors before making your submission.

Leaderboard

The live leaderboard is here.

Citation

@inproceedings{shaham-etal-2023-zeroscrolls,
    title = "{Z}ero{SCROLLS}: A Zero-Shot Benchmark for Long Text Understanding",
    author = "Shaham, Uri  and
      Ivgi, Maor  and
      Efrat, Avia  and
      Berant, Jonathan  and
      Levy, Omer",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-emnlp.536",
    doi = "10.18653/v1/2023.findings-emnlp.536",
    pages = "7977--7989"
}

If you find the ZeroSCROLLS data useful, please make sure to cite also the original dataset papers: [bibtex]

zero_scrolls's People

Contributors

urisha avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.