GithubHelp home page GithubHelp logo

am2rican5 / llama-rs Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rustformers/llm

0.0 0.0 0.0 4.4 MB

Run LLaMA inference on CPU, with Rust ๐Ÿฆ€๐Ÿš€๐Ÿฆ™

License: MIT License

Rust 100.00%

llama-rs's Introduction

LLaMA-rs

Do the LLaMA thing, but now in Rust ๐Ÿฆ€๐Ÿš€๐Ÿฆ™

A llama riding a crab, AI-generated

Image by @darthdeus, using Stable Diffusion

ko-fi

Latest version MIT Discord

Gif showcasing language generation using llama-rs

LLaMA-rs is a Rust port of the llama.cpp project. This allows running inference for Facebook's LLaMA model on a CPU with good performance using full precision, f16 or 4-bit quantized versions of the model.

Just like its C++ counterpart, it is powered by the ggml tensor library, achieving the same performance as the original code.

Getting started

Make sure you have a rust toolchain set up.

  1. Get a copy of the model's weights1
  2. Clone the repository
  3. Build (cargo build --release)
  4. Run with cargo run --release -- <ARGS>

NOTE: For best results, make sure to build and run in release mode. Debug builds are going to be very slow.

For example, you try the following prompt:

cargo run --release -- -m /data/Llama/LLaMA/7B/ggml-model-q4_0.bin -p "Tell me how cool the Rust programming language is:"

Some additional things to try:

  • Use --help to see a list of available options.

  • If you have the alpaca-lora weights, try --repl mode! cargo run --release -- -m <path>/ggml-alpaca-7b-q4.bin -f examples/alpaca_prompt.txt --repl.

    Gif showcasing alpaca repl mode

  • Prompt files can be precomputed to speed up processing using the --cache-prompt and --restore-prompt flags so you can save processing time for lengthy prompts.

    Gif showcasing prompt caching

Q&A

  • Q: Why did you do this?

  • A: It was not my choice. Ferris appeared to me in my dreams and asked me to rewrite this in the name of the Holy crab.

  • Q: Seriously now

  • A: Come on! I don't want to get into a flame war. You know how it goes, something something memory something something cargo is nice, don't make me say it, everybody knows this already.

  • Q: I insist.

  • A: Sheesh! Okaaay. After seeing the huge potential for llama.cpp, the first thing I did was to see how hard would it be to turn it into a library to embed in my projects. I started digging into the code, and realized the heavy lifting is done by ggml (a C library, easy to bind to Rust) and the whole project was just around ~2k lines of C++ code (not so easy to bind). After a couple of (failed) attempts to build an HTTP server into the tool, I realized I'd be much more productive if I just ported the code to Rust, where I'm more comfortable.

  • Q: Is this the real reason?

  • A: Haha. Of course not. I just like collecting imaginary internet points, in the form of little stars, that people seem to give to me whenever I embark on pointless quests for rewriting X thing, but in Rust.

Known issues / To-dos

Contributions welcome! Here's a few pressing issues:

  • The quantization code has not been ported (yet). You can still use the quantized models with llama.cpp.
  • No crates.io release. The name llama-rs is reserved and I plan to do this soon-ish.
  • Any improvements from the original C++ code. (See rustformers#15)
  • Debug builds are currently broken.
  • The code needs to be "library"-fied. It is nice as a showcase binary, but the real potential for this tool is to allow embedding in other services.
  • The code only sets the right CFLAGS on Linux. The build.rs script in ggml_raw needs to be fixed, so inference will be very slow on every other OS.

Footnotes

  1. The only legal source to get the weights at the time of writing is this repository. The choice of words also may or may not hint at the existence of other kinds of sources. โ†ฉ

llama-rs's People

Contributors

setzer22 avatar philpax avatar odysa avatar bcho avatar darthdeus avatar mwbryant avatar trizko avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.