GithubHelp home page GithubHelp logo

l0g1kl1f3 / llamasharp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from scisharp/llamasharp

0.0 0.0 0.0 25.24 MB

C#/.NET binding of llama.cpp, including LLaMa/GPT model inference and quantization, ASP.NET core integration and UI.

License: MIT License

JavaScript 4.95% C# 75.04% CSS 0.29% HTML 3.61% Metal 16.10%

llamasharp's Introduction

LLamaSharp - .NET Binding for llama.cpp

logo

Discord QQ Group LLamaSharp Badge LLamaSharp Badge LLamaSharp Badge LLamaSharp Badge

The C#/.NET binding of llama.cpp. It provides APIs to inference the LLaMa Models and deploy it on local environment. It works on both Windows, Linux and MAC without requirment for compiling llama.cpp yourself. Its performance is close to llama.cpp.

Furthermore, it provides integrations with other projects such as BotSharp to provide higher-level applications and UI.

Documentation

Installation

Firstly, search LLamaSharp in nuget package manager and install it.

PM> Install-Package LLamaSharp

Then, search and install one of the following backends:

LLamaSharp.Backend.Cpu  # cpu for windows, linux and mac
LLamaSharp.Backend.Cuda11  # cuda11 for windows and linux
LLamaSharp.Backend.Cuda12  # cuda12 for windows and linux
LLamaSharp.Backend.MacMetal  # metal for mac

If you would like to use it with microsoft semantic-kernel, please search and install the following package:

LLamaSharp.semantic-kernel

Here's the mapping of them and corresponding model samples provided by LLamaSharp. If you're not sure which model is available for a version, please try our sample model.

LLamaSharp.Backend LLamaSharp Verified Model Resources llama.cpp commit id
- v0.2.0 This version is not recommended to use. -
- v0.2.1 WizardLM, Vicuna (filenames with "old") -
v0.2.2 v0.2.2, v0.2.3 WizardLM, Vicuna (filenames without "old") 63d2046
v0.3.0, v0.3.1 v0.3.0, v0.4.0 LLamaSharpSamples v0.3.0, WizardLM 7e4ea5b
v0.4.1-preview (cpu only) v0.4.1-preview Open llama 3b, Open Buddy aacdbd4
v0.4.2-preview (cpu,cuda11) v0.4.2-preview Llama2 7b GGML 3323112
v0.5.1 v0.5.1 Llama2 7b GGUF 6b73ef1

Many hands make light work. If you have found any other model resource that could work for a version, we'll appreciate it for opening an PR about it! ๐Ÿ˜Š

We publish the backend with cpu, cuda11 and cuda12 because they are the most popular ones. If none of them matches, please compile the llama.cpp from source and put the libllama under your project's output path (guide).

FAQ

  1. GPU out of memory: Please try setting n_gpu_layers to a smaller number.
  2. Unsupported model: llama.cpp is under quick development and often has break changes. Please check the release date of the model and find a suitable version of LLamaSharp to install, or use the model we provide on huggingface.

Usages

Model Inference and Chat Session

LLamaSharp provides two ways to run inference: LLamaExecutor and ChatSession. The chat session is a higher-level wrapping of the executor and the model. Here's a simple example to use chat session.

using LLama.Common;
using LLama;

string modelPath = "<Your model path>"; // change it to your own model path
var prompt = "Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.\r\n\r\nUser: Hello, Bob.\r\nBob: Hello. How may I help you today?\r\nUser: Please tell me the largest city in Europe.\r\nBob: Sure. The largest city in Europe is Moscow, the capital of Russia.\r\nUser:"; // use the "chat-with-bob" prompt here.

// Load a model
var parameters = new ModelParams(modelPath)
{
    ContextSize = 1024,
    Seed = 1337,
    GpuLayerCount = 5
};
using var model = LLamaWeights.LoadFromFile(parameters);

// Initialize a chat session
using var context = model.CreateContext(parameters);
var ex = new InteractiveExecutor(context);
ChatSession session = new ChatSession(ex);

// show the prompt
Console.WriteLine();
Console.Write(prompt);

// run the inference in a loop to chat with LLM
while (prompt != "stop")
{
    foreach (var text in session.Chat(prompt, new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" } }))
    {
        Console.Write(text);
    }
    prompt = Console.ReadLine();
}

// save the session
session.SaveSession("SavedSessionPath");

Quantization

The following example shows how to quantize the model. With LLamaSharp you needn't to compile c++ project and run scripts to quantize the model, instead, just run it in C#.

string srcFilename = "<Your source path>";
string dstFilename = "<Your destination path>";
string ftype = "q4_0";
if(Quantizer.Quantize(srcFileName, dstFilename, ftype))
{
    Console.WriteLine("Quantization succeed!");
}
else
{
    Console.WriteLine("Quantization failed!");
}

For more usages, please refer to Examples.

Web API

We provide the integration of ASP.NET core here. Since currently the API is not stable, please clone the repo and use it. In the future we'll publish it on NuGet.

Since we are in short of hands, if you're familiar with ASP.NET core, we'll appreciate it if you would like to help upgrading the Web API integration.

Demo

demo-console

Roadmap


โœ…: completed. โš ๏ธ: outdated for latest release but will be updated. ๐Ÿ”ณ: not completed


โœ… LLaMa model inference

โœ… Embeddings generation, tokenization and detokenization

โœ… Chat session

โœ… Quantization

โœ… State saving and loading

โš ๏ธ BotSharp Integration

โœ… ASP.NET core Integration

โœ… Semantic-kernel Integration

๐Ÿ”ณ Fine-tune

๐Ÿ”ณ Local document search

๐Ÿ”ณ MAUI Integration

Assets

Some extra model resources could be found below:

The weights included in the magnet is exactly the weights from Facebook LLaMa.

The prompts could be found below:

Contributing

Any contribution is welcomed! Please read the contributing guide. You can do one of the followings to help us make LLamaSharp better:

  • Append a model link that is available for a version. (This is very important!)
  • Star and share LLamaSharp to let others know it.
  • Add a feature or fix a BUG.
  • Help to develop Web API and UI integration.
  • Just start an issue about the problem you met!

Contact us

Join our chat on Discord.

Join QQ group

License

This project is licensed under the terms of the MIT license.

llamasharp's People

Contributors

martindevans avatar asakusarinne avatar saddam213 avatar signalrt avatar drasticactions avatar mihaiii avatar oceania2018 avatar xbotter avatar mlof avatar erinloy avatar sf-mregenhardt avatar redthing1 avatar zombieguy98 avatar fwaris avatar regenhardt avatar zerosoup avatar weiajr avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.