GithubHelp home page GithubHelp logo

10c8 / llama_cpp-rs Goto Github PK

View Code? Open in Web Editor NEW

This project forked from edgenai/llama_cpp-rs

0.0 0.0 0.0 398 KB

High-level, optionally asynchronous Rust bindings to llama.cpp

License: Apache License 2.0

C 0.39% Rust 97.87% Nix 1.74%

llama_cpp-rs's Introduction

llama_cpp-rs

Documentation Crate

Safe, high-level Rust bindings to the C++ project of the same name, meant to be as user-friendly as possible. Run GGUF-based large language models directly on your CPU in fifteen lines of code, no ML experience required!

// Create a model from anything that implements `AsRef<Path>`:
let model = LlamaModel::load_from_file("path_to_model.gguf", LlamaParams::default()).expect("Could not load model");

// A `LlamaModel` holds the weights shared across many _sessions_; while your model may be
// several gigabytes large, a session is typically a few dozen to a hundred megabytes!
let mut ctx = model.create_session(SessionParams::default()).expect("Failed to create session");

// You can feed anything that implements `AsRef<[u8]>` into the model's context.
ctx.advance_context("This is the story of a man named Stanley.").unwrap();

// LLMs are typically used to predict the next word in a sequence. Let's generate some tokens!
let max_tokens = 1024;
let mut decoded_tokens = 0;

// `ctx.start_completing_with` creates a worker thread that generates tokens. When the completion
// handle is dropped, tokens stop generating!
let mut completions = ctx.start_completing_with(StandardSampler::default(), 1024).into_strings();

for completion in completions {
    print!("{completion}");
    let _ = io::stdout().flush();
    
    decoded_tokens += 1;
    
    if decoded_tokens > max_tokens {
        break;
    }
}

This repository hosts the high-level bindings (crates/llama_cpp) as well as automatically generated bindings to llama.cpp's low-level C API (crates/llama_cpp_sys). Contributions are welcome--just keep the UX clean!

Building

Keep in mind that llama.cpp is very computationally heavy, meaning standard debug builds (running just cargo build/cargo run) will suffer greatly from the lack of optimisations. Therefore, unless debugging is really necessary, it is highly recommended to build and run using Cargo's --release flag.

Cargo Features

Several of llama.cpp's backends are supported through features:

  • cuda - Enables the CUDA backend, the CUDA Toolkit is required for compilation if this feature is enabled.
  • vulkan - Enables the Vulkan backend, the Vulkan SDK is required for compilation if this feature is enabled.
  • metal - Enables the Metal backend, macOS only.
  • hipblas - Enables the hipBLAS/ROCm backend, ROCm is required for compilation if this feature is enabled.

Experimental

Something that's provided by these bindings is the ability to predict context size in memory, however it should be noted that this is a highly experimental feature as this isn't something that llama.cpp itself provides. The returned values may be highly inaccurate, however an attempt is made to never return values lower than the real size.

License

MIT or Apache-2.0, at your option (the "Rust" license). See LICENSE-MIT and LICENSE-APACHE.

llama_cpp-rs's People

Contributors

benpoulson avatar casualjim avatar francis2tm avatar github-actions[bot] avatar ianmarmour avatar masmullin2000 avatar nkoppel avatar pedro-devv avatar philschmid avatar scriptis avatar toschoo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.