GithubHelp home page GithubHelp logo

sobelio / llm-chain Goto Github PK

View Code? Open in Web Editor NEW
1.3K 1.3K 127.0 4.81 MB

`llm-chain` is a powerful rust crate for building chains in large language models allowing you to summarise text and complete complex tasks

Home Page: https://llm-chain.xyz

License: MIT License

Rust 97.42% C 0.01% JavaScript 1.43% CSS 0.24% MDX 0.11% C++ 0.80%
chatgpt langchain llama llm openai rust text-summary

llm-chain's People

Contributors

alianse777 avatar alw3ys avatar andychenbruce avatar anthonymichaeltdm avatar danbev avatar danforbes avatar dependabot[bot] avatar dmj16 avatar drager avatar firefragment avatar github-actions[bot] avatar hlhr202 avatar jmuk avatar johnthecoolingfan avatar joshka avatar juzov avatar katopz avatar kyle-mccarthy avatar kylooh avatar lef-f avatar mantono avatar noirgif avatar pablo1785 avatar poorrican avatar ruqqq avatar shinglyu avatar spirosmakris avatar ssoudan avatar timopheym avatar williamhogman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llm-chain's Issues

Problem with "Tutorial: Getting Started using the LLAMA driver"

The Problem

Following the tutorial linked in this project's readme is either incorrect, out of date, or results in a bad build.

Description

I followed the tutorial to get started using the llama driver, and all went well until it asked me to run this command in "Step 4":
python convert.py ./models/alpaca-native

The instructions say that running this in the project root directory should output a ggml-model-f32.bin file in ./models/alpaca-native, however it outputs a file with .gguf extension instead: ggml-model-f32.gguf.
The next instruction asks me to run this command:
./main -m models/alpaca-native/ggml-model-f32.bin -n 128 -p "I love Rust because"
Because there is no ggml-model-f32.bin, this command will not run.

Failed Solutions

I tried to replace the .bin extension with .gguf and run the command the tutorial instructs as this instead:
./main -m models/alpaca-native/ggml-model-f32.gguf -n 128 -p "I love Rust because"
The program runs, and after a minute or two it has this output and is still running:
I love Rust because ÂÄÄÄÄ

While I will yell "AAAA! I love Rust" from time to time, this isn't really what I was hoping for from the program.
What can I do to complete the tutorial successfully?

Change executors to require them to require a known Result type

The executor trait should require execute to return Result<T, ExecutorRunError>.

This means the core llm-chain providing that error type to its users.

One of the error subtypes we should provide is something along the lines of
ExecutorInternalFailure<Box<dyn Error>> but open for suggestions

Panic when MessagePromptTemplate contains unbounded parameters.

Hi,

I'm trying to use your framework but having some issues with the templated prompt.
Basically the issue is the LM is generating code that sometime contains '{}' or '{something}' and they get interpreted as unbounded named parameters during the format() in template::apply_formatting().

Not sure what is the best way to address it. Don't see a use-case for having placeholders in what the 'Assistant' is returning (might be missing it). Was thinking of something like an enum with two variants, one being a template, not the other one. What do you think?

Thanks,
Sebastien


Here is a pair of tests to reproduce the issue (only the second one fails).

In the context of llm-chain-openai/src/chatgpt/prompt.rs:

#[cfg(test)]
mod tests {

    #[test]
    fn test_chat_prompt_template() {
        use super::*;
        use async_openai::types::Role;
        let system_msg = MessagePromptTemplate::new(
            Role::System,
            "You are an assistant that speaks like Shakespeare.".into(),
        );
        let user_msg = MessagePromptTemplate::new(Role::User, "tell me a joke".into());

        let chat_template = ChatPromptTemplate::new(vec![system_msg, user_msg]);
        let messages = chat_template.format(&Parameters::new());
        assert_eq!(messages.len(), 2);
        assert_eq!(messages[0].role, Role::System);
        assert_eq!(
            messages[0].content,
            "You are an assistant that speaks like Shakespeare."
        );
        assert_eq!(messages[1].role, Role::User);
        assert_eq!(messages[1].content, "tell me a joke");
    }

    #[test]
    fn test_chat_prompt_template_with_named_parameters() {
        use super::*;
        use async_openai::types::Role;
        let system_msg = MessagePromptTemplate::new(
            Role::System,
            "You are an assistant that speaks like Shakespeare.".into(),
        );
        let user_msg = MessagePromptTemplate::new(Role::User, "tell me a joke".into());
        let assistant_msg = MessagePromptTemplate::new(
            Role::User,
            "here is one, I'm sure, will crack you {up}".into(),
        );

        let chat_template = ChatPromptTemplate::new(vec![system_msg, user_msg, assistant_msg]);
        let messages = chat_template.format(&Parameters::new());
        assert_eq!(messages.len(), 2);
        assert_eq!(messages[0].role, Role::System);
        assert_eq!(
            messages[0].content,
            "You are an assistant that speaks like Shakespeare."
        );
        assert_eq!(messages[1].role, Role::User);
        assert_eq!(messages[1].content, "tell me a joke");
        assert_eq!(messages[2].role, Role::Assistant);
        assert_eq!(
            messages[2].content,
            "here is one, I'm sure, will crack you {up}"
        );
    }
}

Access intermediary step results

Having the ability to access the result of intermediary results, for example in a sequential chain having the ability to access {{text}}.

    let chain: Chain = Chain::new(vec![
        // First step: make a personalized birthday email
        Step::for_prompt_template(
            prompt!("You are a bot for making personalized greetings", "Make personalized birthday e-mail to the whole company for {{name}} who has their birthday on {{date}}. Include their name")
        ),

        // Second step: summarize the email into a tweet. Importantly, the text parameter becomes the result of the previous prompt.
        Step::for_prompt_template(
            prompt!( "You are an assistant for managing social media accounts for a company", "Summarize this email into a tweet to be sent by the company, use emoji if you can. \n--\n{{text}}")
        )
    ]);

Maybe its possible and I am missing something, feel free to close if that's the case.

Index Loading not working if dumped from different process

Running the example works fine if you both generate, dump then load the index. However, if you generate and dump the index, you cannot reload the index in a new process, without adding the documents again. Running a query on a loaded index, leads to missing document errors.

Do you have to add_documents again after load? As I believe the 'add_documents' method, generates the embeddings itself, does this not lead to redundant calls to openai in which you have to regenerate the embeddings on load a second time?

Executor returned by llm_chain_openai::chatgpt::Executor::for_client() does not implement llm_chain::traits::Executor ([E0277])

So, in my use case, I need an openai executor that uses an organization-id read from an environment variable, looking at the source code for the llm_chain_openai crate (as the documentation is kinda lacking), it would seem that the llm_chain_openai::chatgpt::Executor::for_client() would let me do that by letting me specify the async_openai client to use.

However, while this works

use async_openai::Client;
use lazy_static::lazy_static;
use llm_chain_openai::chatgpt::{Executor, Model, PerInvocation};

lazy_static! {
    pub static ref OPENAI_EXECUTOR: Executor = Executor::for_client(
        {
            let org_id = std::env::var("OPENAI_ORG_ID").unwrap_or_else(|_| "".to_string());
            if org_id.is_empty() {
                Client::new()
            } else {
                Client::new().with_org_id(org_id)
            }
        },
        Some(PerInvocation::new().for_model(Model::ChatGPT3_5Turbo)),
    );
}

it doesn't give me an Executor that implements the llm_chain::traits::Executor trait, meaning I can't actually use it for anything useful

is this intended behavior? if so, why? and if not, what can I/we do about it?

here is the specific error: rustc: the trait bound `OPENAI_EXECUTOR: llm_chain::traits::Executor` is not satisfied the trait `llm_chain::traits::Executor` is implemented for `llm_chain_openai::chatgpt::Executor` [E0277]

Problem with map reduce tutorial

I'm only adding some options to executor, and otherwise using a custom model:

    let opts = options!(
        Model: ModelRef::from_path(model_path),
        ModelType: "llama",
        MaxContextSize:  2048_usize,
        NThreads: 12_usize,
        Temperature: 0.7
    );
    let exec = executor!(llama, opts.clone())?;

I'm getting

thread 'main' panicked at 'Cannot block the current thread from within a runtime. This happens because a function attempted to block the current thread while the thread is being used to drive asynchronous tasks.'

thread 'main' panicked at 'Cannot block the current thread from within a runtime. This happens because a function attempted to block the current thread while the thread is being used to drive asynchronous tasks.',
stack backtrace:
   0: rust_begin_unwind
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:67:14
   2: core::panicking::panic_display
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:150:5
   3: core::panicking::panic_str
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:134:5
   4: core::option::expect_failed
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/option.rs:1932:5
   5: <llm_chain_llama::executor::LLamaTokenizer as llm_chain::tokens::Tokenizer>::tokenize_str
   6: <llm_chain_llama::executor::Executor as llm_chain::traits::Executor>::tokens_used
   7: llm_chain::tokens::ExecutorTokenCountExt::split_to_fit
   8: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
   9: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
  10: core::iter::adapters::try_process
  11: llm_chain::chains::map_reduce::Chain::chunk_documents
  12: <core::pin::Pin<P> as core::future::future::Future>::poll
  13: tokio::runtime::scheduler::current_thread::Context::enter
  14: tokio::runtime::context::scoped::Scoped<T>::set
  15: tokio::runtime::context::set_scheduler
  16: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
  17: tokio::runtime::context::runtime::enter_runtime
  18: tokio::runtime::runtime::Runtime::block_on

Was wondering if I could gain some insight by a rust wizard into what's wrong.

Cannot Compile

I see somebody else was running into this same issue on the llm-chain-template repo (sobelio/llm-chain-template#1). Thought it might be useful to post this issue here since it seems to be an issue with the llm-chain-openai crate itself.

I'm getting the following error when trying to build the basic example from the documentation (https://docs.llm-chain.xyz/docs/getting-started-tutorial/generating-your-first-llm-output):

Updating crates.io index
   Compiling llm-chain-openai v0.12.2
error[E0308]: mismatched types
   --> /Users/jessiewilkins/.cargo/registry/src/index.crates.io-6f17d22bba15001f/llm-chain-openai-0.12.2/src/chatgpt/executor.rs:113:60
    |
113 |         let tokens_used = num_tokens_from_messages(&model, &messages)
    |                           ------------------------         ^^^^^^^^^ expected `&[ChatCompletionRequestMessage]`, found `&Vec<ChatCompletionRequestMessage>`
    |                           |
    |                           arguments to this function are incorrect
    |
    = note: expected reference `&[async_openai::types::types::ChatCompletionRequestMessage]`
               found reference `&Vec<async_openai::types::ChatCompletionRequestMessage>`
note: function defined here
   --> /Users/jessiewilkins/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tiktoken-rs-0.4.5/src/api.rs:358:12
    |
358 |     pub fn num_tokens_from_messages(
    |            ^^^^^^^^^^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0308`.

Wrong struct size on llm-chain-llama-sys when compiling with cuda support, causes segfault on FFI call

If I have
llm-chain-llama-sys ="0.12.3"
to Cargo.toml it runs fine, but if I have
llm-chain-llama-sys = { version = "0.12.3", features = ["cuda"] }
causes the program to segfault.

I tracked down where it was by adding .arg("-DCMAKE_BUILD_TYPE=Debug") to llm-chain-llama-sys/build.rs:84 to tell Cmake to add debug symbols to the llama.cpp. Then stepping through the program in gdb I found it causes the segfault (valgrind says it tries to Jump to the invalid address stated on the next line 0x0: ???) at what I believe to be the first FFI the program calls.

My rust code calls llm_chain_llama::Executor::new_with_options(options), which eventually goes to the line llm-chain-llama/src/context.rs:42 which is an unsafe block to the FFI function llama_context_default_params which starts on llama.cpp/llama.cpp:864.

When gdb enters llama_context_default_params, running bt shows a correct back trace leading back to the rust program. After stepping over the struct initialization, now bt shows that the rust program tries to return to 0x00000000. I assume its because the stack frame is getting messed up. The c++ function llama_context_default_params just returns a struct so the struct size is probably wrong .

I think I found the problem, before the struct initialization on llama.cpp:864 if I add a line

printf("HELLO FROM CPP ASDFASDF struct size is %ld\n", sizeof(llama_context_params));

and then in rust add the line

eprintln!("HELLO FROM RUST struct size = {}", std::mem::size_of::<llm_chain__llama_sys::llama_context_params>()));
let executor = llm_chain::llama::Executor::new_with_options(options)?;
eprintln!("GOT PAST FUNCTION");

If I dont enable features = ["cuda"] they print the same size of 48.

HELLO FROM RUST struct size = 48
HELLO FROM CPP ASDFASDF struct size is 48
GOT PAST FUNCTION
... program runs fine

but if do have features = ["cuda"]

HELLO FROM RUST struct size = 48
HELLO FROM CPP ASDFASDF struct size is 112
Segmentation fault (core dumped)

which I think is the problem.

The struct llama_context_params is defined in llama.h:74, and I think the problem is that it has an member float tensor_split[LLAMA_MAX_DEVICES].

When cuda is enabled the build.rs:88 passes in the build flag -DLLAMA_CUBLAS=ON which on llama.h:5 some preprocessor ifdefs changes the value of LLAMA_MAX_DEVICES which messes with the size of the struct which messes up the cpp to rust bindings where the stack gets messed up and causes a segfault. I think it has something to do with bindgen handling preprocessor stuff.

Generalise tool error handling

Tool handling needs to support scenarios such as then tool isn't installed and similar.

Creating a tool that we are unable to run should throw an error.

PromptTemplates should return `Result` when formatted

Currently there is no way for templates to communicate an error this is problematic because we can't communicate errors from the template.

Therefore we need to:

  • Introduce TemplateError occurring when templating fails
  • Have the users of template do their best to handle these errors if possible.

Prompt formatting, parameters, and dynformat

Flagging that I'm noticing certain "messy" prompts where there's lots of unformatted context being injected into a user prompt is producing an error during prompt formatting. dynformat crate throws in the seq chain example and in simple (no chain) below.
After seeing the error below, if i add parameters!("name") and rerun, then the same prompt will not throw an error.

Noticing the issue when dynamic context contains a word wrapped in {}. I believe this is intended behavior so perhaps calling apps should format context injections etc or maybe this could be handled in lib.

thread 'main' panicked at 'called Result::unwrap()on anErr value: ExecutorError(PromptTemplateError(PromptTemplateError(LegacyTemplateError("missing argument: name"))))'

LLaMA utf-8 problems

Running llama.cpp directly seems to always return valid UTF8, but the llm-chain-llama gets invalid utf8 and panics on unwraping CStr to String about 90% of the time I talk in Chinese with it. I replaced that with the from_utf8_lossy which replaces invalid bytes with a question mark symbol. I noticed that all the question marks come in pairs of 3 which is how many bytes most Chinese characters are encoded as.

For example:
���小平在经���体制改���方面���的取得了���大成功,他������加工业化和开放政���,这些policy有助于打造现代**。在1978年,���小平实行的改���包���:���出自主经���道路、建立特色社会主义市场经���制度和���收西方科技等。这些变化������地改变了**经���的形状,导���全球���的经������长和实现了人民日常生活水平上的提高。

I suspect it is because of it taking many CStrs from the FFI, then each converts them individually with StreamSegment rather than all together, and sometimes the segments end inside the bytes of a character, so some of the character is in the previous segment, some of it in the next. Then when it converts it to a string the word that got split into two segments ends up becoming invalid in each segment, but valid if the bytes from the CStrs were all combined. This doesn't affect ASCII since all letters are only 1 byte so they are always valid no matter which byte the string is cut.

I can think of two ways to fix this. Have Output do Vec<u8> instead of String and then have the user handle gluing the bytes together, or have the Executor have some state that will store the last few bytes if they are invalid and then prepend them to the front of the next chunk of bytes. The second way is a little more complicated but would save the users more hassle and won't mess with the rest of the library.

LLAMA model paths are mishandled before being sent to c++

This works

cargo run --example alpaca -- /workspace/llama.cpp/models/gpt4-x-alpaca-13b-native-ggml-model-q4.bin  ✔  3s   base 
Finished dev [unoptimized + debuginfo] target(s) in 0.08s
Running `/workspace/llm-chain/target/debug/examples/alpaca /workspace/llama.cpp/models/gpt4-x-alpaca-13b-native-ggml-model-q4.bin
llama.cpp: loading model from /workspace/llama.cpp/models/gpt4-x-alpaca-13b-native-ggml-model-q4_.bin

But this does not work

cargo run --example alpaca -- /workspace/llama.cpp/models/gpt4-x-alpaca-13b-native-ggml-model-q4_0.bin  INT ✘  base 
Finished dev [unoptimized + debuginfo] target(s) in 0.08s
Running /workspace/llm-chain/target/debug/examples/alpaca /workspace/llama.cpp/models/gpt4-x-alpaca-13b-native-ggml-model-q4_0.bin`
error loading model: failed to open /workspace/llama.cpp/models/gpt4-x-alpaca-13b-native-ggml-model-q4_0.binq: No such file or directory

Both files exist and in the second case a q is appended.

llm_chain_local: FieldRequiredError on example

Running the simple example from llm-chain-local with the suggested command gives me the following error message:

Error: FieldRequiredError("model_type")

I also tried using OptionsBuilder manually and got the following error message:

Error: FieldRequiredError("Default field missing")

How can I properly use a LLaMa model with this library?

Add module for opinionated text-summaries

We need an opinionated module for summarising text without the user knowing too much about LLMs

  1. Default Prompts for summaries (allowing the user to change them)
    a. Map prompt used for each subdocument
    b. Reduce prompt used for combining document

Concretise `Step`

Now that the new unified prompt representation is here the need for Step to be abstract (i.e. a trait) is reduce instead we could have type Step which contains the per-invocation model config and a prompt (in the common definition).

This would reduce the repetitions in the executor. Instead we would specify only the relationship between the common prompt representation and the specific one for the Executor.

Likewise, this could be a step towards improving the situation with per Step and per Executor model configuration which is non-ideal right now

Minimum supported Rust version?

I tried Rust 1.64.0 and the compilation failed because of unstable features. Then I upgrade to 1.71.0 and it worked. Wanted to check which version other people are using so I can add a minimum version section on the README and documentation.

Prompt builder UI

We need a (web) UI for building prompts and outputting their JSON equivalents

parsing generalise message parsing functionality

we need to expand the tooling functionality to implement generalised parsing of messages.

We need to be able to parse:

  1. The received string is valid yaml
  2. The received string contains one or more blocks that are valid YAML
    a. receive all that match the correct format

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.