GithubHelp home page GithubHelp logo

nevakrien / lmppl_code Goto Github PK

View Code? Open in Web Editor NEW

This project forked from asahi417/lmppl

0.0 0.0 0.0 110 KB

Calculate perplexity on a text with pre-trained language models. Support MLM (eg. DeBERTa), recurrent LM (eg. GPT3), and encoder-decoder LM (eg. Flan-T5).

License: MIT License

Python 100.00%

lmppl_code's Introduction

Language Model Perplexity Library (lmppl_code)

This library provides easy ways to compute perplexity scores using various transformer-based language models. It supports both standalone language models and encoder-decoder language models. Installation

Install the required dependencies:

bash

pip install transformers torch

Usage Standalone Language Models

python

import transformers import lmppl_code as lmppl

Load a model and tokenizer

tokenizer = transformers.AutoTokenizer.from_pretrained('gpt2') model = transformers.AutoModelForCausalLM.from_pretrained('gpt2')

Instantiate the scorer with the pre-loaded model and tokenizer

scorer = lmppl.LM(model='gpt2', tokenizer=tokenizer, model_obj=model)

text = [ 'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee. I am happy.', 'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee. I am sad.' ]

Get the lexical count

count = lmppl.get_lex_count(text, 'c')

Get perplexity

ppl = scorer.get_perplexity(text, count) print(ppl)

Encoder-Decoder Language Models

python

import transformers import lmppl_code as lmppl

tokenizer = transformers.AutoTokenizer.from_pretrained('google/flan-t5-small') model = transformers.AutoModelForSeq2SeqLM.from_pretrained('google/flan-t5-small')

scorer = lmppl.EncoderDecoderLM(tokenizer=tokenizer, model_obj=model) inputs = [ 'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.', 'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.' ] outputs = [ 'I am happy.', 'I am sad.' ]

count = lmppl.get_lex_count(outputs,'c')

Get perplexity

ppl = scorer.get_perplexity(input_texts=inputs, output_texts=outputs, lex_count=count) print(ppl)

Contributing

[Explain how others can contribute to your project] License

[Your License Here]

Please make sure to replace placeholders (like [Your License Here]) with actual content. Modify the text according to your needs.

lmppl_code's People

Contributors

asahi417 avatar nevakrien avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.