sobelio / llm-chain Goto Github PK

`llm-chain` is a powerful rust crate for building chains in large language models allowing you to summarise text and complete complex tasks

Home Page: https://llm-chain.xyz

License: MIT License

Rust 97.42% C 0.01% JavaScript 1.43% CSS 0.24% MDX 0.11% C++ 0.80%

chatgpt langchain llama llm openai rust text-summary

llm-chain's People

Contributors

Stargazers

Watchers

Forkers

mantono katopz hlhr202 williamhogman apepkuss ai-learn-use pablo1785 avyukth spirosmakris ssoudan lef-f bennettjohnson desmodrone ralfnorthman dmj16 variousforks alw3ys genysys joaan-fushi scvsh ai-for danforbes alianse777 drager jaramatch darknight ruqqq jjhw cschin kustomzone benjamin-ky wdshin joshka noirgif parkeregli iq-scm codingonion jayfehr lichaojacobs scaevola 0x6b panosathdbx cometyang trestripes-com daheige propenster jquesnelle itsharex troyedwardsjr inszee roger-sato ai-ar4s-dev kylooh andychenbruce timopheym sherodtaylor aminediro joeblack212 ppppqp shinglyu darti scottll ffdevasr vamseekm locusai comphead achambok vincentqin-sys ronsheely firefragment tommycalvy bitbytelabio patrickyoung beatwade mjaric danbev sbatial t-eberle silvergasp johnthecoolingfan ego poorrican chibuzoro u5d60d9 gengteng desmax74 kyle-mccarthy yyx2586 totaljs-germany bayareaunicorn2 mhlakhani quasar-u127 suvalaki anujasangwai thecowboyai tylermichael jdthomas eihabhala zhihuizhiming kfehlhauer

llm-chain's Issues

Add Chatgpt function call support

Some typo?

Refer to

llm-chain/docs/README.md

Line 20 in 78c2bda

(Role::User, "Make a personalied greet for Joe"),

I willing to make a PR for the fix but I'm not sure this is intend or just a typo 😅

Problem with "Tutorial: Getting Started using the LLAMA driver"

The Problem

Following the tutorial linked in this project's readme is either incorrect, out of date, or results in a bad build.

Description

I followed the tutorial to get started using the llama driver, and all went well until it asked me to run this command in "Step 4":
python convert.py ./models/alpaca-native

The instructions say that running this in the project root directory should output a ggml-model-f32.bin file in ./models/alpaca-native, however it outputs a file with .gguf extension instead: ggml-model-f32.gguf.
The next instruction asks me to run this command:
./main -m models/alpaca-native/ggml-model-f32.bin -n 128 -p "I love Rust because"
Because there is no ggml-model-f32.bin, this command will not run.

Failed Solutions

I tried to replace the .bin extension with .gguf and run the command the tutorial instructs as this instead:
./main -m models/alpaca-native/ggml-model-f32.gguf -n 128 -p "I love Rust because"
The program runs, and after a minute or two it has this output and is still running:
I love Rust because ÂÄÄÄÄ

While I will yell "AAAA! I love Rust" from time to time, this isn't really what I was hoping for from the program.
What can I do to complete the tutorial successfully?

Add support for vector stores

We need vector stores for our models to effectively access information too big for the prompt

Add opinionated default for question answering

Add advanced support for splitting strategies for tokens

Function for measuring context windows

A tokenisation algo
number of tokens to split to
Returns a Vec where each string can fit the window.

Change executors to require them to require a known Result type

The executor trait should require execute to return Result<T, ExecutorRunError>.

This means the core llm-chain providing that error type to its users.

One of the error subtypes we should provide is something along the lines of
ExecutorInternalFailure<Box<dyn Error>> but open for suggestions

Panic when MessagePromptTemplate contains unbounded parameters.

Hi,

I'm trying to use your framework but having some issues with the templated prompt.
Basically the issue is the LM is generating code that sometime contains '{}' or '{something}' and they get interpreted as unbounded named parameters during the format() in template::apply_formatting().

Not sure what is the best way to address it. Don't see a use-case for having placeholders in what the 'Assistant' is returning (might be missing it). Was thinking of something like an enum with two variants, one being a template, not the other one. What do you think?

Thanks,
Sebastien

Here is a pair of tests to reproduce the issue (only the second one fails).

In the context of llm-chain-openai/src/chatgpt/prompt.rs:

#[cfg(test)]
mod tests {

    #[test]
    fn test_chat_prompt_template() {
        use super::*;
        use async_openai::types::Role;
        let system_msg = MessagePromptTemplate::new(
            Role::System,
            "You are an assistant that speaks like Shakespeare.".into(),
        );
        let user_msg = MessagePromptTemplate::new(Role::User, "tell me a joke".into());

        let chat_template = ChatPromptTemplate::new(vec![system_msg, user_msg]);
        let messages = chat_template.format(&Parameters::new());
        assert_eq!(messages.len(), 2);
        assert_eq!(messages[0].role, Role::System);
        assert_eq!(
            messages[0].content,
            "You are an assistant that speaks like Shakespeare."
        );
        assert_eq!(messages[1].role, Role::User);
        assert_eq!(messages[1].content, "tell me a joke");
    }

    #[test]
    fn test_chat_prompt_template_with_named_parameters() {
        use super::*;
        use async_openai::types::Role;
        let system_msg = MessagePromptTemplate::new(
            Role::System,
            "You are an assistant that speaks like Shakespeare.".into(),
        );
        let user_msg = MessagePromptTemplate::new(Role::User, "tell me a joke".into());
        let assistant_msg = MessagePromptTemplate::new(
            Role::User,
            "here is one, I'm sure, will crack you {up}".into(),
        );

        let chat_template = ChatPromptTemplate::new(vec![system_msg, user_msg, assistant_msg]);
        let messages = chat_template.format(&Parameters::new());
        assert_eq!(messages.len(), 2);
        assert_eq!(messages[0].role, Role::System);
        assert_eq!(
            messages[0].content,
            "You are an assistant that speaks like Shakespeare."
        );
        assert_eq!(messages[1].role, Role::User);
        assert_eq!(messages[1].content, "tell me a joke");
        assert_eq!(messages[2].role, Role::Assistant);
        assert_eq!(
            messages[2].content,
            "here is one, I'm sure, will crack you {up}"
        );
    }
}

Access intermediary step results

Having the ability to access the result of intermediary results, for example in a sequential chain having the ability to access {{text}}.

    let chain: Chain = Chain::new(vec![
        // First step: make a personalized birthday email
        Step::for_prompt_template(
            prompt!("You are a bot for making personalized greetings", "Make personalized birthday e-mail to the whole company for {{name}} who has their birthday on {{date}}. Include their name")
        ),

        // Second step: summarize the email into a tweet. Importantly, the text parameter becomes the result of the previous prompt.
        Step::for_prompt_template(
            prompt!( "You are an assistant for managing social media accounts for a company", "Summarize this email into a tweet to be sent by the company, use emoji if you can. \n--\n{{text}}")
        )
    ]);

Maybe its possible and I am missing something, feel free to close if that's the case.

Run tests on Windows x86 and MacOs x86 in Github actions

Add common representation of Chat style prompts

We need a way for users to make Chat style prompts that can be used by either LLaMA or ChatGPT.

Map-reduce should break down reduce step into manageable chunks

Map-reduce chain steps need split up the work of the reduce step into multiple steps if the chunks are too big.

example folder disappeared but not mentioned in README

Index Loading not working if dumped from different process

Running the example works fine if you both generate, dump then load the index. However, if you generate and dump the index, you cannot reload the index in a new process, without adding the documents again. Running a query on a loaded index, leads to missing document errors.

Do you have to add_documents again after load? As I believe the 'add_documents' method, generates the embeddings itself, does this not lead to redundant calls to openai in which you have to regenerate the embeddings on load a second time?

Executor returned by llm_chain_openai::chatgpt::Executor::for_client() does not implement llm_chain::traits::Executor ([E0277])

So, in my use case, I need an openai executor that uses an organization-id read from an environment variable, looking at the source code for the llm_chain_openai crate (as the documentation is kinda lacking), it would seem that the llm_chain_openai::chatgpt::Executor::for_client() would let me do that by letting me specify the async_openai client to use.

However, while this works

use async_openai::Client;
use lazy_static::lazy_static;
use llm_chain_openai::chatgpt::{Executor, Model, PerInvocation};

lazy_static! {
    pub static ref OPENAI_EXECUTOR: Executor = Executor::for_client(
        {
            let org_id = std::env::var("OPENAI_ORG_ID").unwrap_or_else(|_| "".to_string());
            if org_id.is_empty() {
                Client::new()
            } else {
                Client::new().with_org_id(org_id)
            }
        },
        Some(PerInvocation::new().for_model(Model::ChatGPT3_5Turbo)),
    );
}

it doesn't give me an Executor that implements the llm_chain::traits::Executor trait, meaning I can't actually use it for anything useful

is this intended behavior? if so, why? and if not, what can I/we do about it?

here is the specific error: rustc: the trait bound `OPENAI_EXECUTOR: llm_chain::traits::Executor` is not satisfied the trait `llm_chain::traits::Executor` is implemented for `llm_chain_openai::chatgpt::Executor` [E0277]

Revamp prompt template system

The current prompt template system is too simple. We need something more featureful

Problem with map reduce tutorial

I'm only adding some options to executor, and otherwise using a custom model:

    let opts = options!(
        Model: ModelRef::from_path(model_path),
        ModelType: "llama",
        MaxContextSize:  2048_usize,
        NThreads: 12_usize,
        Temperature: 0.7
    );
    let exec = executor!(llama, opts.clone())?;

I'm getting

thread 'main' panicked at 'Cannot block the current thread from within a runtime. This happens because a function attempted to block the current thread while the thread is being used to drive asynchronous tasks.'

thread 'main' panicked at 'Cannot block the current thread from within a runtime. This happens because a function attempted to block the current thread while the thread is being used to drive asynchronous tasks.',
stack backtrace:
   0: rust_begin_unwind
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:67:14
   2: core::panicking::panic_display
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:150:5
   3: core::panicking::panic_str
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:134:5
   4: core::option::expect_failed
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/option.rs:1932:5
   5: <llm_chain_llama::executor::LLamaTokenizer as llm_chain::tokens::Tokenizer>::tokenize_str
   6: <llm_chain_llama::executor::Executor as llm_chain::traits::Executor>::tokens_used
   7: llm_chain::tokens::ExecutorTokenCountExt::split_to_fit
   8: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
   9: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
  10: core::iter::adapters::try_process
  11: llm_chain::chains::map_reduce::Chain::chunk_documents
  12: <core::pin::Pin<P> as core::future::future::Future>::poll
  13: tokio::runtime::scheduler::current_thread::Context::enter
  14: tokio::runtime::context::scoped::Scoped<T>::set
  15: tokio::runtime::context::set_scheduler
  16: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
  17: tokio::runtime::context::runtime::enter_runtime
  18: tokio::runtime::runtime::Runtime::block_on

Was wondering if I could gain some insight by a rust wizard into what's wrong.

Cannot Compile

I see somebody else was running into this same issue on the llm-chain-template repo (sobelio/llm-chain-template#1). Thought it might be useful to post this issue here since it seems to be an issue with the llm-chain-openai crate itself.

I'm getting the following error when trying to build the basic example from the documentation (https://docs.llm-chain.xyz/docs/getting-started-tutorial/generating-your-first-llm-output):

Updating crates.io index
   Compiling llm-chain-openai v0.12.2
error[E0308]: mismatched types
   --> /Users/jessiewilkins/.cargo/registry/src/index.crates.io-6f17d22bba15001f/llm-chain-openai-0.12.2/src/chatgpt/executor.rs:113:60
    |
113 |         let tokens_used = num_tokens_from_messages(&model, &messages)
    |                           ------------------------         ^^^^^^^^^ expected `&[ChatCompletionRequestMessage]`, found `&Vec<ChatCompletionRequestMessage>`
    |                           |
    |                           arguments to this function are incorrect
    |
    = note: expected reference `&[async_openai::types::types::ChatCompletionRequestMessage]`
               found reference `&Vec<async_openai::types::ChatCompletionRequestMessage>`
note: function defined here
   --> /Users/jessiewilkins/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tiktoken-rs-0.4.5/src/api.rs:358:12
    |
358 |     pub fn num_tokens_from_messages(
    |            ^^^^^^^^^^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0308`.

Allow setting the model for openai

Implement a Simple Agent module

Langchain does it like this:
https://github.com/hwchase17/langchain/blob/master/langchain/agents/chat_v2/prompt.py

This prompt seems to be working well in the wild. We should use that style of prompting except that our tools system uses

Make convenience macro for Parameters

We need something to make it easier to create them :)

Rethink default features in llm-chain

As in #51 . Should vector databases be default features?

Add support for Milvus vector DB

Milvus has recently published a repo and a crate for their official Rust SDK https://github.com/milvus-io/milvus-sdk-rust

Add functionality for grabbing templates from files

Can't update llama.cpp with latest bindings.

Cargo build results in an error, using latest commit in llama.cpp.

Wrong struct size on llm-chain-llama-sys when compiling with cuda support, causes segfault on FFI call

If I have
llm-chain-llama-sys ="0.12.3"
to Cargo.toml it runs fine, but if I have
llm-chain-llama-sys = { version = "0.12.3", features = ["cuda"] }
causes the program to segfault.

I tracked down where it was by adding .arg("-DCMAKE_BUILD_TYPE=Debug") to llm-chain-llama-sys/build.rs:84 to tell Cmake to add debug symbols to the llama.cpp. Then stepping through the program in gdb I found it causes the segfault (valgrind says it tries to Jump to the invalid address stated on the next line 0x0: ???) at what I believe to be the first FFI the program calls.

My rust code calls llm_chain_llama::Executor::new_with_options(options), which eventually goes to the line llm-chain-llama/src/context.rs:42 which is an unsafe block to the FFI function llama_context_default_params which starts on llama.cpp/llama.cpp:864.

When gdb enters llama_context_default_params, running bt shows a correct back trace leading back to the rust program. After stepping over the struct initialization, now bt shows that the rust program tries to return to 0x00000000. I assume its because the stack frame is getting messed up. The c++ function llama_context_default_params just returns a struct so the struct size is probably wrong .

I think I found the problem, before the struct initialization on llama.cpp:864 if I add a line

printf("HELLO FROM CPP ASDFASDF struct size is %ld\n", sizeof(llama_context_params));

and then in rust add the line

eprintln!("HELLO FROM RUST struct size = {}", std::mem::size_of::<llm_chain__llama_sys::llama_context_params>()));
let executor = llm_chain::llama::Executor::new_with_options(options)?;
eprintln!("GOT PAST FUNCTION");

If I dont enable features = ["cuda"] they print the same size of 48.

HELLO FROM RUST struct size = 48
HELLO FROM CPP ASDFASDF struct size is 48
GOT PAST FUNCTION
... program runs fine

but if do have features = ["cuda"]

HELLO FROM RUST struct size = 48
HELLO FROM CPP ASDFASDF struct size is 112
Segmentation fault (core dumped)

which I think is the problem.

The struct llama_context_params is defined in llama.h:74, and I think the problem is that it has an member float tensor_split[LLAMA_MAX_DEVICES].

When cuda is enabled the build.rs:88 passes in the build flag -DLLAMA_CUBLAS=ON which on llama.h:5 some preprocessor ifdefs changes the value of LLAMA_MAX_DEVICES which messes with the size of the struct which messes up the cpp to rust bindings where the stack gets messed up and causes a segfault. I think it has something to do with bindgen handling preprocessor stuff.

Add support for llm-mpt

Implement ReACt agent

Fix problem with map-reduce tutorial

Seems that the context window is not properly cut down

LLama ChatConversations should evaluate and inject "Assistant:"

before any Assistant answer.

Generalise tool error handling

Tool handling needs to support scenarios such as then tool isn't installed and similar.

Creating a tool that we are unable to run should throw an error.

Make macro for creating executors

executor!(chatgpt)

PromptTemplates should return `Result` when formatted

Currently there is no way for templates to communicate an error this is problematic because we can't communicate errors from the template.

Therefore we need to:

Introduce TemplateError occurring when templating fails
Have the users of template do their best to handle these errors if possible.

Prompt formatting, parameters, and dynformat

Flagging that I'm noticing certain "messy" prompts where there's lots of unformatted context being injected into a user prompt is producing an error during prompt formatting. dynformat crate throws in the seq chain example and in simple (no chain) below.
After seeing the error below, if i add parameters!("name") and rerun, then the same prompt will not throw an error.

Noticing the issue when dynamic context contains a word wrapped in {}. I believe this is intended behavior so perhaps calling apps should format context injections etc or maybe this could be handled in lib.

thread 'main' panicked at 'called Result::unwrap()on anErr value: ExecutorError(PromptTemplateError(PromptTemplateError(LegacyTemplateError("missing argument: name"))))'

LLaMA utf-8 problems

Running llama.cpp directly seems to always return valid UTF8, but the llm-chain-llama gets invalid utf8 and panics on unwraping CStr to String about 90% of the time I talk in Chinese with it. I replaced that with the from_utf8_lossy which replaces invalid bytes with a question mark symbol. I noticed that all the question marks come in pairs of 3 which is how many bytes most Chinese characters are encoded as.

For example:
��小平在经��体制改��方面��的取得了��大成功，他��加工业化和开放政��，这些policy有助于打造现代**。在1978年，��小平实行的改��包��：��出自主经��道路、建立特色社会主义市场经��制度和��收西方科技等。这些变化��地改变了**经��的形状，导��全球��的经��长和实现了人民日常生活水平上的提高。

I suspect it is because of it taking many CStrs from the FFI, then each converts them individually with StreamSegment rather than all together, and sometimes the segments end inside the bytes of a character, so some of the character is in the previous segment, some of it in the next. Then when it converts it to a string the word that got split into two segments ends up becoming invalid in each segment, but valid if the bytes from the CStrs were all combined. This doesn't affect ASCII since all letters are only 1 byte so they are always valid no matter which byte the string is cut.

I can think of two ways to fix this. Have Output do Vec<u8> instead of String and then have the user handle gluing the bytes together, or have the Executor have some state that will store the last few bytes if they are invalid and then prepend them to the front of the next chunk of bytes. The second way is a little more complicated but would save the users more hassle and won't mess with the rest of the library.

Improve error handling around Tools

There are some unwraps going on around the code. We should try and remove them to improve safety and predictability

LLAMA model paths are mishandled before being sent to c++

This works

cargo run --example alpaca -- /workspace/llama.cpp/models/gpt4-x-alpaca-13b-native-ggml-model-q4.bin  ✔  3s   base 
Finished dev [unoptimized + debuginfo] target(s) in 0.08s
Running `/workspace/llm-chain/target/debug/examples/alpaca /workspace/llama.cpp/models/gpt4-x-alpaca-13b-native-ggml-model-q4.bin
llama.cpp: loading model from /workspace/llama.cpp/models/gpt4-x-alpaca-13b-native-ggml-model-q4_.bin

But this does not work

cargo run --example alpaca -- /workspace/llama.cpp/models/gpt4-x-alpaca-13b-native-ggml-model-q4_0.bin  INT ✘  base 
Finished dev [unoptimized + debuginfo] target(s) in 0.08s
Running /workspace/llm-chain/target/debug/examples/alpaca /workspace/llama.cpp/models/gpt4-x-alpaca-13b-native-ggml-model-q4_0.bin`
error loading model: failed to open /workspace/llama.cpp/models/gpt4-x-alpaca-13b-native-ggml-model-q4_0.binq: No such file or directory

Both files exist and in the second case a q is appended.

Add methods for estimating token use for Conversation chains

llm_chain_local: FieldRequiredError on example

Running the simple example from llm-chain-local with the suggested command gives me the following error message:

Error: FieldRequiredError("model_type")

I also tried using OptionsBuilder manually and got the following error message:

Error: FieldRequiredError("Default field missing")

How can I properly use a LLaMa model with this library?

Add module for opinionated text-summaries

We need an opinionated module for summarising text without the user knowing too much about LLMs

Default Prompts for summaries (allowing the user to change them)
a. Map prompt used for each subdocument
b. Reduce prompt used for combining document

Concretise `Step`

Now that the new unified prompt representation is here the need for Step to be abstract (i.e. a trait) is reduce instead we could have type Step which contains the per-invocation model config and a prompt (in the common definition).

This would reduce the repetitions in the executor. Instead we would specify only the relationship between the common prompt representation and the specific one for the Executor.

Likewise, this could be a step towards improving the situation with per Step and per Executor model configuration which is non-ideal right now

LLM Memory

It seems you have not included the Memory components of langchain such as mentioned here in the [langchain doc].(https://python.langchain.com/docs/modules/memory.html)

Add basic library of commonly used prompts

There are lists of good basic prompt formats out there let's implement a few of those :)

Add usage info to output

Add a construct for a task list.

We need something to manage tasks list.

Research into what kinds of prompts are needed.

Add convenience macro for creating prompts

Probably blocked by #34

Something like

prompt!(
system, "my template here"
user, "hello let's try this stuff out!"
)

prompt!(
  "just a text prompt don't worry!"
)

The received string is valid yaml
The received string contains one or more blocks that are valid YAML
a. receive all that match the correct format