GithubHelp home page GithubHelp logo

pbqa's Introduction

Pattern Based Question and Answer

Description

Pattern Based Question and Answer (PBQA) is a Python library that provides tools for querying LLMs and managing text embeddings. It combines guided generation with multi-shot prompting to improve response quality and ensure consistency. By enforcing valid responses, PBQA makes it easy to combine the flexibility of LLMs with the reliability and control of symbolic approaches.

Installation

PBQA requires Python 3.9 or higher, and can be installed via pip:

pip install PBQA

Additionally, PBQA requires a running instance of llama.cpp to interact with LLMs. For instructions on installation, see the llama.cpp repository.

Usage

llama.cpp

For instructions on hosting a model with llama.cpp, see the following page. Optionally, caching can be enabled to speed up generation.

Python

PBQA provides a simple API for querying LLMs.

from PBQA import DB, LLM
from time import strftime

# First, we set up a database at a specified path
db = DB(path="examples/db")
# Then, we load a pattern file into the database
db.load_pattern("examples/weather.yaml")

# Next, we connect to the LLM server
llm = LLM(db=db, host="127.0.0.1")
# And connect to the model
llm.connect_model(
    model="llama",
    port=8080,
    stop=["<|eot_id|>", "<|start_header_id|>"],
    temperature=0,
)

# Finally, we query the LLM and receive a response based on the specified pattern
# Optionally, external data can be provided to the LLM which it can use in its response
weather_query = llm.ask(
        "Could I see the stars tonight?",
        "weather",
        "llama",
        external={"now": strftime("%Y-%m-%d %H:%M")},
    )

Using the weather.yaml pattern file and llama 3 running on 127.0.0.1:8080, the response should look something like this:

{
    "latitude": 51.51,
    "longitude": 0.13,
    "time": "2024-06-18 01:00",
}

For more information, see the examples directory.

Pattern Files

Pattern files are used to guide the LLM in generating responses. They are written in YAML and consist of three parts: the system prompt, component metadata, and examples.

# The system prompt is the main instruction given to the LLM telling it what to do
system_prompt: Your job is to translate the user's input into a weather query. Reply with the json for the weather query and nothing else.
now:  # Each component of the response needs to have it's own key, "component:" at minimum
  external: true  # Optionally, specify whether the component requires external data
latitude:
  grammar: |  # Or define a GBNF grammar
    root         ::= coordinate
    coordinate   ::= integer "." integer
    integer      ::= digit | digit digit
    digit        ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
longitude:
  grammar: ...
time:
  grammar: ...
examples:  # Lastly, examples can be provided for multi-shot prompting
- input: What will the weather be like tonight
  now: 2019-09-30 10:36
  latitude: 51.51
  longitude: 0.13
  time: 2019-09-30 20:00
- input: Could I see the stars tonight?
  ...

For more examples, look at the pattern files in the examples directory. Information on the GBNF grammar format can be found here.

Cache

Unless overridden, queries using the same pattern will use the same system prompt and base examples, allowing a large part of the response to be cached and speeding up generation. This can be disabled by setting use_cache=False in the ask() method.

PBQA allocates a slot/process for each pattern-model pair in the llama.cpp server. Set -np to the number of unique combinations of patterns and models you want to enable caching for. Slots are allocated in the order they are requested, and if the number of available slots is exceeded, the last slot is reused for any excess pattern-model pairs.

You can manually assign a cache slot to a specific pattern-model pair using the link method. Optionally, a specific cache slot can be provided, up to the number of available processes. The cache slot used for a query can also be overridden by passing the cache_slot parameter to the llm.ask() method.

from PBQA import DB, LLM


db = DB(path="examples/db")
db.load_pattern("examples/weather.yaml")

llm = LLM(db=db, host="127.0.0.1")
llm.connect_model(
    model="llama",
    port=8080,
    stop=["<|eot_id|>", "<|start_header_id|>"],
    temperature=0,
)
llm.link(pattern="weather", model="llama")

Once a pattern-model pair is linked, the "model" parameter in the ask() method may also be omitted. The query will instead use the model assigned during the last appropriate link call.

Roadmap

Future features in no particular order with no particular timeline:

  • Preset grammars for common data types
  • Parallel query execution
  • Combining multi-shot prompting with message history
  • Multimodal support
  • Further speed improvements (possibly batching)
  • Support for more LLM backends

Relevant Literature

Contributing

Contributions are welcome! If you have any suggestions or would like to contribute, please open an issue or a pull request.

Support

If you want to support the development of PBQA, consider buying me a coffee. Any support is greatly appreciated!

License and Acknowledgements

This project is licensed under the terms of the MIT License. For more details, see the LICENSE file.

Qdrant is a vector database that provides an API for managing and querying text embeddings. PBQA uses Qdrant to store and retrieve text embeddings.

llama.cpp is a C++ library that provides an easy-to-use interface for running LLMs on a wide variety of hardware. It includes support for Apple silicon, x86 architectures, and NVIDIA GPUs, as well as custom CUDA kernels for running LLMs on AMD GPUs via HIP. PBQA uses llama.cpp to interact with LLMs.

PBQA was developed by Bart Haagsma as part of different project. If you have any questions or suggestions, please feel free to contact me at [email protected].

pbqa's People

Contributors

baagsma avatar sjaak31367 avatar

Stargazers

 avatar  avatar

Watchers

Kostas Georgiou avatar  avatar

Forkers

sjaak31367

pbqa's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.