GithubHelp home page GithubHelp logo

melanie531 / fmeval Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aws/fmeval

0.0 0.0 0.0 1.63 MB

Foundation Model Evaluations Library

Home Page: http://aws.github.io/fmeval

License: Apache License 2.0

Shell 0.73% Python 99.27%

fmeval's Introduction

Foundation Model Evaluations Library

FMEval is a library to evaluate Large Language Models (LLMs) and select the best LLM for your use case. The library can help evaluate LLMs for the following tasks:

  • Open-ended generation - the production of natural language as a response to general prompts that do not have a pre-defined structure.
  • Text summarization - summarizing the most important parts of a text, shortening a text while preserving its meaning.
  • Question Answering - the generation of a relevant and accurate response to a question.
  • Classification - assigning a category, such as a label or score, to text based on its content.

The library contains the following:

  • Implementation of popular metrics (eval algorithms) such as Accuracy, Toxicity, Semantic Robustness and Prompt Stereotyping for evaluating LLMs across different tasks.
  • Implementation of the ModelRunner interface. ModelRunner encapsulates the logic for invoking LLMs, exposing a predict method that greatly simplifies interactions with LLMs within eval algorithm code. The interface can be extended by the user for their LLMs. We have built-in support for AWS SageMaker Jumpstart Endpoints, AWS SageMaker Endpoints and Bedrock Models.

Installation

To install the package from PIP you can simply do:

pip install fmeval

Usage

You can see examples of running evaluations on your LLMs with built-in or custom datasets in the examples folder.

Main steps for using fmeval are:

  1. Create a ModelRunner which can perform invocations on your LLM. We have built-in support for AWS SageMaker Jumpstart Endpoints, AWS SageMaker Endpoints and AWS Bedrock Models. You can also extend the ModelRunner interface for any LLMs hosted anywhere.
  2. Use any of the supported eval_algorithms.
eval_algo = get_eval_algorithm("toxicity", ToxicityConfig())
eval_output = eval_algo.evaluate(model=model_runner)

Note: You can update the default eval config parameters for your specific use case.

Using a custom dataset for an evaluation

We have our built-in datasets configured, which are consumed for computing the scores in eval algorithms. You can choose to use a custom dataset in the following manner.

  1. Create a DataConfig for your custom dataset
config = DataConfig(
    dataset_name="custom_dataset",
    dataset_uri="./custom_dataset.jsonl",
    dataset_mime_type="application/jsonlines",
    model_input_location="question",
    target_output_location="answer",
)
  1. Use an eval algorithm with a custom dataset
eval_algo = get_eval_algorithm("toxicity", ToxicityConfig())
eval_output = eval_algo.evaluate(model=model_runner, dataset_config=config)

Please refer to code documentation and examples for understanding other details around the usage of eval algorithms.

Development

Setup

Once a virtual environment is set up with python3.10, run the following command to install all dependencies:

./devtool all

Adding python dependencies

We use poetry to manage python dependencies in this project. If you want to add a new dependency, please update the pyproject.toml file, and run the poetry update command to update the poetry.lock file (which is checked in).

Other than this step above to add dependencies, everything else should be managed with devtool commands.

Adding your own Eval Algorithm

Details TBA

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

fmeval's People

Contributors

amazon-auto avatar danielezhu avatar dependabot[bot] avatar franluca avatar keerthanvasist avatar lorenzwl avatar malhotra18 avatar oyangz avatar pinaraws avatar polaschwoebel avatar ramvegiraju avatar rvasahu-amazon avatar satish615 avatar xiaoyi-cheng avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.