scisharp / llamasharp Goto Github PK

View Code? Open in Web Editor NEW

2.0K 51.0 270.0 166.85 MB

A C#/.NET library to run LLM models (🦙LLaMA/LLaVA) on your local device efficiently.

Home Page: https://scisharp.github.io/LLamaSharp

License: MIT License

C# 72.87% Metal 23.16% HTML 1.45% CSS 0.14% JavaScript 2.39%

chatbot gpt llama llamacpp llm semantic-kernel llava multi-modal llama2 llama3

llamasharp's People

Contributors

Stargazers

Watchers

Forkers

asakusarinne maru005 final-saber yya5119 kkd4soei drasticactions waynemunro weiajr alpara nicojuicy anhgeeky bbzkfmzyjtbf joetomkinson mimustriurus robertpi xbotter longhronshen allisterb kittnz theterrasque devnullx64 theolivenbaum relentless-dev-purchases mhalo wangchengqun trrahul jslomp orfeous leecig infinitespace-studios mlof brad457 suxrobgm zenmaxe huafangyun cool-lei neuhaus miketschudi gchacko rubiktubik keritervdeveloping elgransan kjyofficefork saddam213 fwaris martindevans nasa03 kevingkday sunxiaotianmg olaniyitomilola bleissem radimbrixi jordan-hemming kleyoit hyper-bug 591094733 signalrt arturgolenia bronwin87 0x00405a00 profe-98 zerosoup yrezehi erinloy manesiotise kwame-wallace fastflair amtdev hodachuy rj-77 achyun karayakar regenhardt viswatj chenxy1986 ko9ma7 mihaiii ruilvcomceo ruilvdotcomceo ridingmower ashd kang-donghoon-eland levi106 alderoberge cyrilmagsuci steven4547466 sqrl143 congiuluc redthing1 khoanguyen1806 rflechner davecs1 kaayo l0g1kl1f3 dvaughan fasteddys happypotter1993 eublefar 2k3o mygit-2023

llamasharp's Issues

Better link for awesome chatgpt prompts

it's the English version of the repository.
https://github.com/f/awesome-chatgpt-prompts

Add an Agent like the LangChain does

Would be great to see some extensible agent implementation for this repo.

error: NU1108: Cycle detected.

I have tried all versions in both dotnet 6 and 7 and got the same error.
I'm on Windows 11.
Am I missing something else??

Is it possible to train or fine tune a model with LLamaSharp?

Hi,

How can i fine tune or train a llama model with LLamaSharp? I couldn't find a documentation about fine-tuning, training a model or using LLamaSharp Library for any other purposes.

Thank you.

There is no way to save a conversation

I checked several times, and there is really no way to save a conversation from session to session.

I tried the SaveState and it doesn't save the content of the conversation at all, even what appears in the attached examples, he doesn't really remember, he just makes it up, if you ask him what we talked about before he doesn't know.
I would be very happy if someone knows how to save the details of a conversation for the next session, I am trying to develop a chat application, but it is meaningless, if everything is deleted with each session
Thanks for all the efforts in developing this library

0.4.2-preview issue: Interactive mode stops responding after the first request

I was using 0.4 and I was getting good results with the InteractiveExecutor. I moved to 0.4.2-preview and now it will only generate one response. All additional calls to InferAsync returns no results.

I thought that maybe something changed and the AntiPrompt was blocking output. I removed all AntiPrompts and I still get nothing in return.

Example call here:

                inferenceParams = new InferenceParams() { Temperature = temperature , MaxTokens = maxTokens };
                await foreach (var text in executor.InferAsync(prompt, inferenceParams))
                    theResponse += text;

Been testing with the nous-hermes-13b.ggmlv3.q5_1.bin model with both the 0.4 version and the current version. No other code changes. Upgraded to 0.4.2-preview of both LLamaSharp and the Cuda12 backend.

Any thoughts on why this works once but returns nothing on subsequent calls?

FileNotFoundException on MacOS Arm64

Hi,

i want to use your awesome library on my macbook pro m1 arm64 mac.

I have created a new console .NET 7 application, added the needed nugets

Added some code:

using LLama.Common;
using LLama;

string modelPath = "<Your model path>" // change it to your own model path
var prompt = "Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.\r\n\r\nUser: Hello, Bob.\r\nBob: Hello. How may I help you today?\r\nUser: Please tell me the largest city in Europe.\r\nBob: Sure. The largest city in Europe is Moscow, the capital of Russia.\r\nUser:";

// Initialize a chat session
var ex = new InteractiveExecutor(new LLamaModel(new ModelParams(modelPath, contextSize: 1024, seed: 1337, gpuLayerCount: 5)));
ChatSession session = new ChatSession(ex);

Console.WriteLine();
Console.Write("User: ");
while (prompt != "stop")
{
    foreach (var text in session.Chat(prompt, new InferenceParams() { Temperature = 0.1f, AntiPrompts = new List<string> { "User:" } }))
    {
        Console.Write(text);
    }
    prompt = Console.ReadLine();
}

But when i use dotnet run i get

the following error:

Unhandled exception. System.IO.FileNotFoundException: Could not load file or assembly 'LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null'. The system cannot find the file specified.

But in my binary folder everything is there:

Do you have an idea what i'am doing wrong?

Incorrect mirostat sampling

The current setup for mirostat sampling is incorrect. At the moment it looks like this:

float mirostat_mu = 2.0f * mirostatTau;
id = SamplingApi.llama_sample_token_mirostat_v2(_ctx, candidates, mirostatTau, mirostatEta, ref mirostat_mu);

The mirostat_mu value is always 2 * tau, the new value assigned into mirostat_mu (through the ref) is never used.

This is not correct, the mu value should be initialised to 2 * tau* but then subsequent calls should pass in the mu` value from last time. This is how mirostat adapts over time.

See #72

build error on web api

Hi
I just took latest code from main branch and getting this build error. Can anyone advice. Thanks in advance

Severity Code Description Project File Line Suppression State
Error CS0311 The type 'LLama.LLamaModel' cannot be used as type parameter 'T' in the generic type or method 'ChatSession'. There is no implicit reference conversion from 'LLama.LLamaModel' to 'LLama.OldVersion.IChatModel'. LLama.WebAPI C:\Users\conta\source\repos\LLamaSharp\LLama.WebAPI\Services\ChatService.cs 8 Active

This is the code impacted

LLamaModel model = new(new LLamaParams(model: @"ggml-model-q4_0.bin", n_ctx: 512, interactive: true, repeat_penalty: 1.0f, verbose_prompt: false));
_session = new ChatSession<LLamaModel>(model)

cyrillic doesn't work

I have model which generating text using cyrillic alphabet. It's work in llama-cpp-python but in LLamaSharp I heve unknown symbols:

Here is my code:

var modelPath = @"C:\Temp\ggml-saiga-13b-q4_1.bin";
var model = new LLamaModel(new LLamaParams(model: modelPath, n_ctx: 512, interactive: true, antiprompt: File.ReadAllLines("antiprompt.txt").ToList(),
                repeat_penalty: 1.0f));
var session = new ChatSession<LLamaModel>(model).WithPromptFile("prompt.txt").WithAntiprompt(File.ReadAllLines("antiprompt.txt"));
var outp = session.Chat("User: Почему трава зеленая? \r\nSaiga: ", encoding: "UTF-8");
string resp = "";
foreach (var output in outp)
{
    resp += output;
    if (resp.EndsWith(USER_FLAG))
    {
        break;
    }
}

I tried using encodings UTF-8, ASCII, and Unicode, but all of them give same result

No LLamaSharp backend was installed

I have the same problem, can't figure out what I'm missing? Created a new project,

`RuntimeError: The native library cannot be found. It could be one of the following reasons:

No LLamaSharp backend was installed. Please search LLamaSharp.Backend and install one of them.
You are using a device with only CPU but installed cuda backend. Please install cpu backend instead.
The backend is not compatible with your system cuda environment. Please check and fix it. If the environment is expected not to be changed, then consider build llama.cpp from source or submit an issue to LLamaSharp.
`
OS: Windows 10 (64 bit) 19044.1288
.NET 7.x
CUDA\v11.7

Error with .SaveState()

When calling SaveState on the executor I am getting the folliwing error:

"System.ArgumentException: '.NET number values such as positive and negative infinity cannot be written as valid JSON. To make it work when using 'JsonSerializer', consider specifying 'JsonNumberHandling.AllowNamedFloatingPointLiterals' (see https://docs.microsoft.com/dotnet/api/system.text.json.serialization.jsonnumberhandling).'"

This is with 0.4.2-preview.

OpenBuddy models

https://scisharp.github.io/LLamaSharp/0.4/GetStarted/
It is mentioned here that the OpenBuddy models are fine with the software
but...
In practice it doesn't work, it should work when the base of the software is updated, but until this update is done, it still doesn't work with these models
I mean the llama.cpp has been updated to work with these models
But it is not updated here anymore.

first token must be BOS

Hi,
Thank you, this project is amazing

At some point I got this error

llama_eval_internal: first token must be BOS
llama_eval: failed to eval

Which throw this exception, from https://github.com/SciSharp/LLamaSharp/blob/master/LLama/Logger.cs#L6

System.IO.FileNotFoundException: 'Could not load file or assembly 'Serilog, Version=2.0.0.0, Culture=neutral, PublicKeyToken=24c2f752a8e58a10'. The system cannot find the file specified.'

Note I compiled LLamaSharp from master as I needed the latest version to support the latest GGML

Regards

hardcoded path in dll? User:LLAMA_ASSERT: E:\s\repos\llama.cpp\llama.cpp:1343: !!kv_self.ctx

Steps to reptroduce:
Run the example project, choose 1
load llama-2-7b-guanaco-qlora.ggmlv3.q8_0.bin file
code breaks with positive exit code, please see output below:

Hardcoded path to cpp in dll
Please choose the version you want to test:
0. old version (for v0.3.0 or earlier version)

new version (for versions after v0.4.0)

Your Choice: 1

================LLamaSharp Examples (New Version)==================

Please input a number to choose an example to run:
0: Run a chat session without stripping the role names.
1: Run a chat session with the role names strippped.
2: Interactive mode chat by using executor.
3: Instruct mode chat by using executor.
4: Stateless mode chat by using executor.
5: Load and save chat session.
6: Load and save state of model and executor.
7: Get embeddings from LLama model.
8: Quantize the model.

Your choice: 2
Please input your model path: G:\Temp4\llama-2-7B-Guanaco-QLoRA-GGML\llama-2-7b-guanaco-qlora.ggmlv3.q8_0.bin
llama.cpp: loading model from G:\Temp4\llama-2-7B-Guanaco-QLoRA-GGML\llama-2-7b-guanaco-qlora.ggmlv3.q8_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 256
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 7 (mostly Q8_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: model size = 7B
The executor has been enabled. In this example, the prompt is printed, the maximum tokens is set to 128 and the context size is 256. (an example for small scale usage)
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:LLAMA_ASSERT: E:\s\repos\llama.cpp\llama.cpp:1343: !!kv_self.ctx

C:\Users\Radim\Documents\LLamaSharp\LLama.Examples\bin\Debug\net6.0\LLama.Examples.exe (process 20504) exited with code -1073740791.
To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops.
Press any key to close this window . . .

Please note, that I am new and I am not sure if the model v2 is supported, but hardcoded path in dll looks not ok... I did not find yet, how to recomplile the dlls...

Flash Attention 2

Flash Attention 2 is out, they say it saves half the time, when will it be here?
Thanks for everything

Not working with GPU backend cuda12.x

I get an error on my machine when I try to run this on GPU. I've installed the Lllama.Backend.Cuda12 package.

Running the nvidia-smi command gives me:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 528.49       Driver Version: 528.49       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   47C    P8     9W /  60W |     93MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

and my C# code looks like:

using LLama;

string myModelPath = "models/wizard-vicuna-13B.ggmlv3.q4_1.bin";

var model = new LLamaModel(new LLamaParams(model: myModelPath, n_ctx: 1024, repeat_penalty: 1.2f, temp: 0.0001f, n_gpu_layers: 100));
var session = new ChatSession<LLamaModel>(model).WithAntiprompt(new[] { "User:" });
Console.Write("\nUser:");
while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    var question = Console.ReadLine();
    Console.ForegroundColor = ConsoleColor.White;
    var outputs = session.Chat(question);
    foreach (var output in outputs)
    {
        Console.Write(output);
    }
}

And finally the exception:

Unhandled exception. System.TypeInitializationException: The type initializer for 'LLama.Native.NativeApi' threw an exception.
 ---> LLama.Exceptions.RuntimeError: The native library cannot be found. It could be one of the following reasons:
1. No LLamaSharp backend was installed. Please search LLamaSharp.Backend and install one of them.
2. You are using a device with only CPU but installed cuda backend. Please install cpu backend instead.
3. The backend is not compatible with your system cuda environment. Please check and fix it. If the environment is expected not to be changed, then consider build llama.cpp from source or submit an iss
ue to LLamaSharp.
   at LLama.Native.NativeApi..cctor()
   --- End of inner exception stack trace ---

Dll not found exception on test

I'm not sure if I can register an issue or if I have a guide on how to register an issue, but I'd like to try this project, so excuse me.

I downloaded 0.2.1 and ran the test and got the error.
The error couldn't find the dll 'libllama'. I tried on Windows 11 and Mac Big Sur.

Is there a way to solve the above problem? Or is there anything I did wrong or should I missed?

SEHE exception

Hello,

I have downloaded your Llama.Examples project, downloaded ggml-model-f32-q4_0.bin from https://huggingface.co/AsakusaRinne/LLamaSharpSamples/tree/v0.3.0/LLaMa/7B

And when I publish code for win-x64 runtime, then I am not able to run it on Windows. I am getting following errror:

Fatal error. System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception.
Repeat 2 times:
--------------------------------
   at LLama.Native.NativeApi.llama_init_backend()
--------------------------------
   at LLama.Native.NativeApi..cctor()
   at LLama.Native.NativeApi.llama_context_default_params()
   at LLama.Utils.llama_init_from_gpt_params(LLama.LLamaParams ByRef)
   at LLama.LLamaModel..ctor(LLama.LLamaParams, System.String, Boolean, System.String)
   at LLama.Examples.ChatSession..ctor(System.String, System.String, System.String[])
   at Program.<Main>$(System.String[])

System.TypeInitializationException: 'The type initializer for 'LLama.Native.NativeApi' threw an exception.' RuntimeError: The native library cannot be found...

Hi! Thanks for the work you've done on this!

I’m following the example on this page - https://scisharp.github.io/LLamaSharp/0.4/GetStarted/
but get the above error when trying to run it with the .NetFramwork 4.8 however, it runs perfectly fine when using .NetCore.

I’ve tried with LLamaSharp 0.4.0 and the 0.4.1 preview, using both the Cuda 11 and CPU backends but get same error.
Also tried LLamaSharp 0.3.0 but this generates errors in the code of “The type or namespace name 'type/namespace' could not be found” for InteractiveExecutor, ModelParams, etc.

Note: as stated above, when I use the same example code in a .NetCore application, it works fine (but I would like to use an assembly that .NetCore doesn’t support hence trying to get it working in .NetFramework).

Line that threw exception:
var ex = new InteractiveExecutor(new LLamaModel(new ModelParams(modelPath, contextSize: 1024, seed: 1337, gpuLayerCount: 5)));

Exception details:
_System.TypeInitializationException
HResult=0x80131534
Message=The type initializer for 'LLama.Native.NativeApi' threw an exception.
Source=LLamaSharp
StackTrace:
at LLama.Native.NativeApi.llama_context_default_params()
at LLama.Utils.InitLLamaContextFromModelParams(ModelParams params)
at LLama.LLamaModel..ctor(ModelParams Params, String encoding, ILLamaLogger logger)
at LlamaSharpTestNet4.Program.Main(String[] args) in N:\CSharp2023\LlamaSharpTestNet4\Program.cs:line 20

This exception was originally thrown at this call stack:
[External Code]

Inner Exception 1:
RuntimeError: The native library cannot be found. It could be one of the following reasons:

No LLamaSharp backend was installed. Please search LLamaSharp.Backend and install one of them.
You are using a device with only CPU but installed cuda backend. Please install cpu backend instead.
The backend is not compatible with your system cuda environment. Please check and fix it. If the environment is expected not to be changed, then consider build llama.cpp from source or submit an issue to LLamaSharp._

Porting to a winform

I've been trying to port this to a VS Winform using a rich text box as the output as opposed to the console. Are there some good WinForm examples using LlamaSharp which I can use?

My goal is to incorporate this into a program for the visually disabled and they cannot use the terminal.

The model I use works fine in a terminal however when I try and incorporate LLamasharp into a Win Form the output from the same model can vary.

Run from storage instead of memory

I also tried LLama withing python and as a standalone process and noticed is was able to read from device storage instead of memory. Is this a possibilty within LLamaSharp aswell? Thanks!

License

Just wondering about license; it says that it is MIT, which is great. But LLAMA itself is not for commercial use, so how the whole project is MIT (which allows commercial use?

Enhance the docs

Please provide more docs on the implemented methods/interfaces, ect.

llama_state

It looks like recent updates to Llama.cpp (e.g. ggerganov/llama.cpp#1797) have modified the API significantly with regards to how "state" is handled.

The llama_model is loaded with one API call (llama_load_model_from_file), which loads all of the static data (weights, vocabulary etc) and then you can create one or more states over this (llama_new_context_with_model).

Is anyone else working on this? If not I'm happy to have a go at it.

使用Run a LLamaModel with instruct mode，在回答完第一个问题之后，出现了自动回答的bug

How do I make a custom build on MacOS?

When I make a custom build on MacOS it makes a libllama.so file. However it seems Llamasharp is expecting a .dll file? I also see a libllama.dylib, and the runtimes folder has other dlls. How do we determine which runtime it uses?

The reason I'm trying to do this is the latest llama.cpp project has support for Metal and it runs much faster on my hardware. I'm trying to build an API around it, which I can do with the files from /runtime out of the box, but I'm not familiar enough with building C++ to include my custom build in there.

CUDA Error 12

While using the InstructExecutor, I get this error:

> CUDA error 12 at D:/development/llama/llama.cpp/ggml-cuda.cu:646: invalid pitch argument

My C# code:

using LLama;
using LLama.Common;

string myModelPath = "model/wizardLM-7B.ggmlv3.q4_1.bin";

var model = new LLamaModel(new ModelParams(myModelPath));
var session = new InstructExecutor(model);

while (true)
{
    string question = File.ReadAllText("prompt.txt");
    var outputs = session.Infer(question, new InferenceParams
    {
        Temperature = 0.00001f,
        RepeatPenalty = 1.2f
    });
    foreach (var output in outputs)
    {
        Console.Write(output);
    }
}

SEHException when trying to load pygmalion model

I have been messing around with trying to load a ggml pygmalion model which I have converted to ggml using llama.cpp. I can run it in both in Gpt4All using the C# wrapper and in Oobabooga's text generation web UI.

Unfortunately every time I am trying to load the model using LLamaSharp i get this error:
System.Runtime.InteropServices.SEHException: 'External component has thrown an exception.'
Which originates from :
extern IntPtr llama_init_from_file(string path_model, LLamaContextParams params_);

Here is my code:

namespace AiTest
{
	internal class Program
	{
		static void Main(string[] args)
		{
			Console.WriteLine("Hello, World!");
			LLamaModel model = new LLamaModel(new LLamaParams(model: "pygmalion-13b-4bit-128g-gpt-j\\ggml-model-q4_1.bin", n_ctx: 512, repeat_penalty: 1.0f, n_gpu_layers: 20));
		}
	}
}

Gpt4All says that the model is in the ggjt v1 (latest) format if that helps.

How do you run the example with GPU?

Setting n_gpu_layers has no effect? How do you run the example with GPU?

Made a simple example for Unity demonstrating the use of this library.

I created a thread just in case.
This is just a simple example, a very simple chat. But it will allow those who are interested to use this library in Unity.

I put a working Unity example at https://github.com/Xsanf/LLaMa_Unity.
Just install Unity 2021 or higher and create an empty 3D project.
Assets > Import package > Custom package "LLaMa_Unity.unitypackage". After installing it.
In Project Panel > Scene. Load SampleScene. Set the unsafe flag, and run the example.

AntiPrompt in generated result

In the example of WebApi, AntiPrompt is used, and it always appears in the final generated result. Is this a feature or am I using the wrong method?

Falcon 7b instruct 5bit doesn't work

https://huggingface.co/TheBloke/falcon-7b-instruct-GGML/tree/main

I encounter this error when I try to load the 5bit model:

error loading model: missing tok_embeddings.weight

capturing the output

Hello,
Thanks again for all the work you put in here.
The output that was written to the console when loading the model, is there a way to capture it in a windowed application?
I want to blog it, but I don't have direct access to it through C#

Unexpected behavior when using initializer of LLamaParams

LLamaParams lamaParams = new LLamaParams()
{
    model = modelFullpath,
    n_ctx = 512,
    repeat_penalty = 1.0f,
    prompt = "",
    interactive = true,
    n_gpu_layers = 8,
}; // in this way, n_threads is set to 0 instead of processor count.

When is the next release going to be?

I see that there are many important updates, I would love to know when the next big update is going to be

Pinvoke Issue when using .net Framework 4.8

i try to implement the latest version of LLamaSharp in my project which is based on .net Framework 4.8 (X64).
as far i know .net 4.8 is fully compatible with .Net Standard 2.0 (LLamaSharp is available for .net 2.0 this in Nuget).
i can access methods from LLamaSharp. But if i try to load the model i get a "Pinvoke Incompatible" Error Message.
not sure if this is a issue with LLamaSharp or .Net. Hopefully you can help or give a hint how to fix.

Add auto gpu layer count detection

Currently, we have to provide the right amount of gpu layer count for each model individually.

It would be nice, if the best possible amount of gpu layer count can be computed or estimated automagically dependent of user's gpu capability.

Llama 2 Support

Since it’s public https://ai.meta.com/llama/
Do you have any plan to support it too?

Prompting issues

Hi there this works great already but I am looking for some help on prompting / settings. I want the model to act as another human and not say it's a helpfull assistant or add anything to the prompt I am giving it.

For example my prompt looks like this:

Transcript of a dialog between an cashier and a customer.
The customeris named Emily Stevens and aged 32.
More information about Emily Stevens: Emily Stevens works as a psychologist, specializing in cognitive behavioral therapy. Her kind-hearted nature and exceptional analytical skills make her a trusted and effective therapist. She helps her clients navigate through their emotional challenges, providing guidance and support to promote their mental well-being.

User:

Immediately when starting the model adds texts like

User: How are you.
(The cashier asks how the customers day is)

When conversing it really like to return as an ai assistant.
Is there anything I can change in my prompt or a model I am using?

I am new to this so am having problems findind new models I am using wizardLM-7B.ggmlv3.q8_0.bin

Integrate the semantic kernel via API

The semantic kernel provides a very good infrastructure for integrating LLM, developing a connector package to integrate into the semantic kernel

"LLAMA_ASSERT: ...\development\llama\llama.cpp\llama.cpp:904: false" ERROR

Hi there,

I got this error when i tried to use OpenLM's 3B model open-llama-3b-q4_1.bin

LLAMA_ASSERT: ...\development\llama\llama.cpp\llama.cpp:904: false

It works fine with 7B model (open-llama-7b-q4_0) of OpenLM.

Thanks for your help.

Embedding recommended model

Anyone have a suggestion for embedding models to use with LlamaSharp? I have not had any luck getting any models to work.

Multiple sessions on one model.

Hi there, would love to have multiple sessions on the same model but th sessions seem to remember new information given by the other chat sessions. In the docs and settings I couldn't find anyhting. I am curious if this is somemthing I am doing wrong?

Better link for awesome chatgpt prompts (it's the english version of the repository)

The chat API could be only used under interactive model

Hi,

I have upgraded LLamaSharp version from 0.2.2 to 0.3.0. I'm getting this error below now:

The chat API could be only used under interactive model

StackTrace:

at LLama.LLamaModel.Chat(String text, String prompt, String encoding)
at LLama.ChatSession`1.d__5.MoveNext()
at Program.

$(String[] args) in D:\Program.cs:line 15

When I turn LLamaSharp version to 0.2.2, it works fine.

Anyone can help me?

How to dispose loaded model?

If the GPU Memory has been occupied, then the Dispose method call leads to a crash.

So, how to dispose an already loaded model correctly?

Providing instructions on how to build libllama

Hello!

Great work so far, i have tons of fun with it. However libllama.dll which is included is outdated, and i just cant figure out how to build it myself. llama.cpp build scripts always end up with executables instead of a dll.

Some instructions on how to update it would be much appreciated.

Document Q&A via document loaders

Consider a typical situation where one would like to "inject" some sort of information coming from a pdf, json, xml, ect. and the user would ask questions about it.

How would we implement this using LLamaSharp? Do we need some kind of word embedding stuff that is done in LangChain?

scisharp / llamasharp Goto Github PK

llamasharp's People

Contributors

Stargazers

Watchers

Forkers

llamasharp's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs