GithubHelp home page GithubHelp logo

pszemraj / lm-api Goto Github PK

View Code? Open in Web Editor NEW
2.0 3.0 0.0 86 KB

Efficiently query multiple prompts with ease: a command-line tool for batch querying large language models.

License: Apache License 2.0

Python 100.00%
api gpt-3 gpt-neo gpt-neox openai batch-processing prompt-engineering prompt-toolkit

lm-api's Introduction

lm-api

Command-line utilities for querying large language models

This repo is built around making it easy to run a set of queries via CLI on a large language model (LM) and get back a set of completions formatted nicely into a single document. It also has a basic Python API.

Typical workflow:

  1. Create CSV/.xlsx/etc. file with model queries as rows
  2. Run lm-api with -i /path/to/my/queries.csv, and use -kc to specify the column name with the queries
  3. Get completions compiled into a single markdown file!

Queries are expected to be in a pandas-compatible format, and results are written to a text file with markdown formatting for easy viewing/sharing.

An example output file is provided in data/lm-api-output.


Installation

Directly install via pip+git:

# create a virtual environment (optional): pyenv virtualenv 3.8.5 lm-api
pip install git+https://github.com/pszemraj/lm-api.git

Install from source

Alternatively, after cloning, cd into the lm-api directory and run:

git clone https://github.com/pszemraj/lm-api.git
cd lm-api
# create a virtual environment (optional): pyenv virtualenv 3.8.5 lm-api
pip install -e .

A quick test can be run with the src/lm_api/test_goose_api.py script.

On API Keys

You will need an API key for each provider you want to query. Currently, the following providers are supported:

API keys can be set in the environment variables GOOSE and OPENAI:

export OPENAI=api_key11111114234234etc
# or
export GOOSE=api_key11111114234234etc

Alternatively, pass as an argument when calling lm-api with the -k switch.

Usage

Command line scripts are located in src/lm_api/ and become installed as CLI commands that can be run from anywhere. Currently, the commands are limited to lm-api (more to come).

⚠️NOTE: your API key must be set in the environment variables or passed as an argument to lm-api with the -k flag to run any queries⚠️

Example

lm-api -i data/test_queries.xlsx -o ./my-test-folder

This will run the queries in data/test_queries.xlsx and write the results to a .md file in my-test-folder/ in your current working directory.

Details

There are many options for the script, which can be viewed with the -h flag (e.g., lm-api -h).

usage: lm-api [-h] [-i INPUT_FILE] [-o OUTPUT_DIR] [-provider PROVIDER_ID] [-k KEY] [-p PREFIX] [-s SUFFIX] [-simple]
              [-kc KEY_COLUMN] [-m MODEL_ID] [-n N_TOKENS] [-t TEMPERATURE] [-f2 FREQUENCY_PENALTY]
              [-p2 PRESENCE_PENALTY] [-v]

Input File Format

The input file should be in a pandas-compatible format (e.g., .csv, .xlsx, etc.). The default column name for the queries is query, which can be changed with the -kc flag.

An example input file is provided in data/test_queries.xlsx.


TODO / Roadmap

Note: this is a work in progress, and the following is a running list of things that need to be done. This may and likely will be updated.

  • adjust the --prefix and --suffix flags to a "prompt engine" switch that can augment/update the prompt with a variety of options (e.g., --prompt-engine=prefix or --prompt-engine=prefix+suffix)
  • add a simple CLI command that does not require a query file
  • add support for other providers (e.g., textsynth)
  • validate performance as package / adjust as needed (i.e., import lm_api should work and have full functionality w.r.t. CLI)
  • setup tests

We are compiling/discussing a list of potential features in the discussions section, so please feel free to add your thoughts there!


Project generated with PyScaffold

lm-api's People

Contributors

dpaleka avatar pszemraj avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

lm-api's Issues

Safeguards for accidental large cost runs

Implement some sort of warning that you need to explicitly override to continue if the input file is unreasonably large.

This doesn't prevent failure modes when the script itself is called multiple times, but it's a good start.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.