SLED

The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022).

SLED models use pretrained, short-range encoder-decoder models, and apply them over. long-text inputs by splitting the input into multiple overlapping chunks, encoding each independently and perform fusion-in-decoder.

Data

The data for this paper is hosted on the dataset hub here. It is based on the SCROLLS dataset (paper), the SQuAD 1.1 dataset (paper) and the HotpotQA dataset (paper). It doesn't contain any unpublished data, but includes the configuration needed for the paper.

Usage example :

from datasets import load_dataset
qasper = load_dataset("tau/sled","qasper")

Installation

Make sure to install pytorch according to your machine spec. See installation options here.

Installing SLED is easy with pip.

pip install py-sled

Some backbone models require additional dependencies. If you wish to work with T5 for example, you can install using.

pip install py-sled[t5]

If you wish to run the examples, install the required dependencies with

pip install py-sled[examples]

If you wish to continue developing this repository, install the full development requirments with

pip install py-sled[dev]

Usage

Working with SLED is seamless when working with HuggingFace's Transformers AutoClasses.

A minimal usage example:

import sled  # ** required so SLED would be properly registered by the AutoClasses **
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('tau/bart-base-sled')
model = AutoModel.from_pretrained('tau/bart-base-sled')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state

Important: You need to import sled before using the AutoClass (e.g. AutoModel.from_pretrained('tau/bart-base-sled)) for it to work.

Minimal working example can be found here.

To work with SCROLLS like data that was used for the paper, see here.

Custom datasets

For SLED to be able to prepend the prefix input to every chunk, it requires the input tensor prefix_length. If using a custom dataset, you can refer to run.py for the correct way to preprocess the data.

Note: Currently, HF's Seq2SeqTrainer doesn't pass the prefix_length tensor in the prediction loop, so you should use the CustomSeq2SeqTrainer or something similar until it is fixed.

Backbone models

There are multiple model cards available on HuggingfaceHub including

Bart-Base SLED (model name tau/bart-base-sled)
Bart-Large SLED (model name tau/bart-base-sled)
T5(v1.1)-base SLED (model name tau/t5-v1_1-base-sled)
T5(v1.1)-large SLED (model name tau/t5-v1_1-largeß-sled)

If you wish to use a custom model that is available as a model card (public or private) on the hub, or use different parameters for SLED, you can create a json config file like the below, and change the underlying_config to your custom model card.

{
  "model_type": "tau/sled",
  "underlying_config": "facebook/bart-base",
  "context_size": 256,
  "window_fraction": 0.5,
  "prepend_prefix": true,
  "encode_prefix": true,
  "sliding_method": "dynamic"
}

You can then load it like below

import sled
from transformers import AutoModelForSeq2SeqLM
custom_sled_model = AutoModelForSeq2SeqLM.from_pretrained(<your custom json config>)

Citation

If you use this repository, please cite as below:

@inproceedings{Ivgi2022EfficientLU,
  title={Efficient Long-Text Understanding with Short-Text Models},
  author={Maor Ivgi and Uri Shaham and Jonathan Berant},
  year={2022}
}

Disclaimer

This repository is still under active development, and may contain some unintended behavior. Please open an issue if any unexpected behaviour occurs, and we will promptly try to fix it.

ntaylorox / sled Goto Github PK

sled's Introduction

SLED

Data

Installation

Usage

Custom datasets

Backbone models

Citation

Disclaimer

sled's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs