Comments (5)
I did actually manage to figure this out with semantic memory
. I'll put a proper example for that version together tomorrow. The advantage of that over the solution above is that it actually just returns context using cosine similarity on embeddings so you can utilize any executor just by adding the context to the prompt.
from llamasharp.
using LLama;
using LLama.Common;
using LLama.Native;
using LLamaSharp.SemanticKernel.TextEmbedding;
using Microsoft.SemanticKernel.Connectors.Sqlite;
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Text;
using System.Text;
// Initialize native library before anything else
string llamaPath = Path.GetFullPath("<path to local lib>/libllama.so");
NativeLibraryConfig.Instance.WithLibrary(llamaPath, null);
// Download a document and create embeddings for it
#pragma warning disable SKEXP0050, SKEXP0001, SKEXP0020
var embeddingModelPath = Path.GetFullPath("<path to embed model>/nomic-embed.gguf");
var embeddingParameters = new ModelParams(embeddingModelPath) { ContextSize = 4096, GpuLayerCount = 13, Embeddings = true };
var embeddingWeights = LLamaWeights.LoadFromFile(embeddingParameters);
var embedder = new LLamaEmbedder(embeddingWeights, embeddingParameters);
var service = new LLamaSharpEmbeddingGeneration(embedder);
ISemanticTextMemory memory = new MemoryBuilder()
.WithMemoryStore(await SqliteMemoryStore.ConnectAsync("mydata.db"))
.WithTextEmbeddingGeneration(service)
.Build();
Console.WriteLine("===== INGESTING =====");
IList<string> collections = await memory.GetCollectionsAsync();
string folderPath = Path.GetFullPath("<path to folder>/Embeddings");
string[] files = Directory.GetFiles(folderPath);
string collectionName = "TestCollection";
if (collections.Contains(collectionName))
{
Console.WriteLine("Found database");
}
else
{
foreach (var item in files.Select((path, index) => new { path, index }))
{
Console.WriteLine($"Ingesting file #{item.index}");
string text = File.ReadAllText(item.path);
var paragraphs = TextChunker.SplitPlainTextParagraphs(TextChunker.SplitPlainTextLines(text, 128), 512);
foreach (var para in paragraphs.Select((text, index) => new { text, index } ))
await memory.SaveInformationAsync(collectionName, para.text, $"Document {item.path}, Paragraph {para.index}");
}
Console.WriteLine("Generated database");
}
Console.WriteLine("===== DONE INGESTING =====");
StringBuilder builder = new();
Console.Write("Question: ");
string question = Console.ReadLine()!;
builder.Clear();
Console.WriteLine("===== RETRIEVING =====");
List<string> sources = [];
await foreach (var result in memory.SearchAsync(collectionName, question, limit: 1, minRelevanceScore: 0))
{
builder.AppendLine(result.Metadata.Text);
sources.Add(result.Metadata.Id);
}
builder.AppendLine("""
Sources:
""");
foreach (string source in sources)
{
builder.AppendLine($" {source}");
}
Console.WriteLine("===== DONE RETRIEVING =====");
Console.WriteLine(builder.ToString());
#pragma warning restore SKEXP0001, SKEXP0050, SKEXP0020
We have to supress some warnings here because semantic memory is technically considered experimental. This just uses LLamaSharp to generate embeddings and allows us to search anything compatible with Semantic Memory
with those embeddings returning the most relevant text chunks. This doesn't do any generation so you'd have to add the context to the prompt manually.
Some things to consider is that this is generally the fist step of RAG and there are a lot of steps you can add in between this and adding it to the prompt. Such as returning multiple sources and reranking them, summirzation and so on. I'll leave some helpful resources as well:
https://github.com/pchunduri6/rag-demystified
https://medium.com/@thakermadhav/build-your-own-rag-with-mistral-7b-and-langchain-97d0c92fa146
https://medium.com/@talon8080/mastering-rag-chatbots-building-advanced-rag-as-a-conversational-ai-tool-with-langchain-d740493ff328
from llamasharp.
I'm trying to make basically exactly this right now. I got the BatchedExecutor
figured out recently but now trying to integrate RAG into that pipeline is proving difficult. I wouldn't mind turning my final result into an example.
I'd like to verify an assumption as well: "When combining text generation and RAG in one application 3 model instances are needed, one for generating embeddings, one for retrieval generation and one for text generation". I feel like those last two instances could be one but I don't know if this to be possible because when creating KernelMemory a seperate model is instantiated.
I'm currently have this
using LLama.Native;
using LLamaSharp.KernelMemory;
using Microsoft.KernelMemory;
using Microsoft.KernelMemory.FileSystem.DevTools;
using Microsoft.KernelMemory.MemoryStorage.DevTools;
string nativePath = "<path to native llama>";
NativeLibraryConfig.Instance.WithLibrary(nativePath, null);
string generationModelPath = "<path to any LLM in GGUF format>";
string embeddingModelPath = "<path to any embedding model in GGUF format>";
string storageFolder = "<path to storage folder>";
var llamaGenerationConfig = new LLamaSharpConfig(generationModelPath);
var llamaEmbeddingConfig = new LLamaSharpConfig(embeddingModelPath);
var vectorDbConfig = new SimpleVectorDbConfig() { Directory = storageFolder, StorageType = FileSystemTypes.Disk };
var memory = new KernelMemoryBuilder()
.WithLLamaSharpTextGeneration(llamaGenerationConfig)
.WithLLamaSharpTextEmbeddingGeneration(llamaEmbeddingConfig)
.WithSimpleVectorDb(vectorDbConfig)
.Build();
Console.WriteLine("\n================== INGESTION ==================\n");
Console.WriteLine("Uploading text about E=mc^2");
await memory.ImportTextAsync("""
In physics, mass–energy equivalence is the relationship between mass and energy
in a system's rest frame, where the two quantities differ only by a multiplicative
constant and the units of measurement. The principle is described by the physicist
Albert Einstein's formula: E = m*c^2
""");
Console.WriteLine("Uploading article file about Carbon");
await memory.ImportDocumentAsync("wikipedia.txt");
Console.WriteLine("\n================== RETRIEVAL ==================\n");
var question = "What's E = m*c^2?";
Console.WriteLine($"Question: {question}");
var answer = await memory.AskAsync(question);
Console.WriteLine($"\nAnswer: {answer.Result}\n\n Sources:\n");
// Show sources / citations
foreach (var x in answer.RelevantSources)
{
Console.WriteLine(x.SourceUrl != null
? $" - {x.SourceUrl} [{x.Partitions.First().LastUpdate:D}]"
: $" - {x.SourceName} - {x.Link} [{x.Partitions.First().LastUpdate:D}]");
}
I adapted this from this example on KernelMemory from Microsoft. But its current answer to everything is:
warn: Microsoft.KernelMemory.Search.SearchClient[0]
No memories available
Answer: INFO NOT FOUND
Sources:
Edit: I fixed this by removing the minRelevance parameter from AskAsync()
from llamasharp.
I'd like to verify an assumption as well: "When combining text generation and RAG in one application 3 model instances are needed, one for generating embeddings, one for retrieval generation and one for text generation". I feel like those last two instances could be one but I don't know if this to be possible because when creating KernelMemory a seperate model is instantiated.
I agree that 3 models are needed, however I think the second one is actually not necessary to be a LLM. It could be an algorithm to find similarity of embeddings. Therefore the last two models is less likely to be merged into one.
TBH I'm not an expert of RAG, either. I think you will get a much better answer if you ask this question in kernel-memory
issues. :)
Thank you a lot for looking into this issue!
from llamasharp.
The example looks good. @xbotter Do you have any idea about further improve it?
from llamasharp.
Related Issues (20)
- [BUG]: Vulkan backend crash on model loading HOT 3
- [BUG]: Different continuation after restoring state HOT 1
- Improve `LLamaEmbedder` HOT 2
- [BUG]: KernelMemory.AskAsync() does not work - exception: object reference not set to an instance of an object HOT 25
- [BUG]: fatal error using gemma-2-2b-it HOT 3
- [BUG]: "The type or namespace 'Common' does not exist in the namespace 'LLama'" HOT 4
- Application Not Using GPU Despite Installing LlamaSharp.Backend.Cuda12 HOT 1
- [Feature]: Add development support for Dev Containers HOT 6
- How do i use RAG by kernel memory and Semantic kernel Handlebar Planner with llama3 HOT 3
- versioning issue HOT 11
- [BUG]: gemma-2-9b-it-GGUF - error loading model HOT 3
- [BUG]: Error when starting LLama Cuda11/12 HOT 6
- [BUG]: Second Response Empty when using Grammar HOT 3
- LLamaSharp v0.15.0 broke cuda backend HOT 15
- [BUG]: KernelMemory - Simultaneous execution of AskDocument & ImportDocument HOT 18
- [BUG]: Error setting variables HOT 1
- [BUG:] When switching to new versions of LLamaSharp 0.16.0, there was a slowdown HOT 30
- A few moments in the process of work LLamaSharp & KernelMemory
- Question about promt templates
- [BUG]: DefaultSamplingPipeline - strange behavior at high temperature HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llamasharp.