A better example with guide is needed for RAG. It could be considered with the followi

<div class="highlight highlight-source-cs notranslate position-relative overflow-auto" dir="auto" da

I'm trying to make basically exactly this right now. I got the <code class="notranslat

The example looks good. <a class="user-mention notranslate" data-hovercard-type="user"

Add a best practice example for RAG about llamasharp HOT 5 OPEN

AsakusaRinne commented on September 27, 2024 1

Add a best practice example for RAG

from llamasharp.

Comments (5)

WesselvanGils commented on September 27, 2024 1

I did actually manage to figure this out with semantic memory. I'll put a proper example for that version together tomorrow. The advantage of that over the solution above is that it actually just returns context using cosine similarity on embeddings so you can utilize any executor just by adding the context to the prompt.

from llamasharp.

WesselvanGils commented on September 27, 2024 1

using LLama;
using LLama.Common;
using LLama.Native;
using LLamaSharp.SemanticKernel.TextEmbedding;
using Microsoft.SemanticKernel.Connectors.Sqlite;
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Text;
using System.Text;

// Initialize native library before anything else
string llamaPath = Path.GetFullPath("<path to local lib>/libllama.so");
NativeLibraryConfig.Instance.WithLibrary(llamaPath, null);

// Download a document and create embeddings for it
#pragma warning disable SKEXP0050, SKEXP0001, SKEXP0020

var embeddingModelPath = Path.GetFullPath("<path to embed model>/nomic-embed.gguf");
var embeddingParameters = new ModelParams(embeddingModelPath) { ContextSize = 4096, GpuLayerCount = 13, Embeddings = true };
var embeddingWeights = LLamaWeights.LoadFromFile(embeddingParameters);
var embedder = new LLamaEmbedder(embeddingWeights, embeddingParameters);

var service = new LLamaSharpEmbeddingGeneration(embedder);

ISemanticTextMemory memory = new MemoryBuilder()
    .WithMemoryStore(await SqliteMemoryStore.ConnectAsync("mydata.db"))
    .WithTextEmbeddingGeneration(service)
    .Build();

Console.WriteLine("===== INGESTING =====");

IList<string> collections = await memory.GetCollectionsAsync();

string folderPath = Path.GetFullPath("<path to folder>/Embeddings");
string[] files = Directory.GetFiles(folderPath);

string collectionName = "TestCollection";

if (collections.Contains(collectionName))
{
    Console.WriteLine("Found database");
}
else
{
    foreach (var item in files.Select((path, index) => new { path, index }))
    {
        Console.WriteLine($"Ingesting file #{item.index}");
        string text = File.ReadAllText(item.path);
        var paragraphs = TextChunker.SplitPlainTextParagraphs(TextChunker.SplitPlainTextLines(text, 128), 512);

        foreach (var para in paragraphs.Select((text, index) => new { text, index } ))
            await memory.SaveInformationAsync(collectionName, para.text, $"Document {item.path}, Paragraph {para.index}");
    }

    Console.WriteLine("Generated database");
}
Console.WriteLine("===== DONE INGESTING =====");

StringBuilder builder = new();

Console.Write("Question: ");
string question = Console.ReadLine()!;
builder.Clear();

Console.WriteLine("===== RETRIEVING =====");

List<string> sources = [];
await foreach (var result in memory.SearchAsync(collectionName, question, limit: 1, minRelevanceScore: 0))
{
    builder.AppendLine(result.Metadata.Text);
    sources.Add(result.Metadata.Id);
}

builder.AppendLine("""

Sources:
""");

foreach (string source in sources)
{
    builder.AppendLine($"    {source}");
}
Console.WriteLine("===== DONE RETRIEVING =====");

Console.WriteLine(builder.ToString());

#pragma warning restore SKEXP0001, SKEXP0050, SKEXP0020

We have to supress some warnings here because semantic memory is technically considered experimental. This just uses LLamaSharp to generate embeddings and allows us to search anything compatible with Semantic Memory with those embeddings returning the most relevant text chunks. This doesn't do any generation so you'd have to add the context to the prompt manually.

Some things to consider is that this is generally the fist step of RAG and there are a lot of steps you can add in between this and adding it to the prompt. Such as returning multiple sources and reranking them, summirzation and so on. I'll leave some helpful resources as well:
https://github.com/pchunduri6/rag-demystified
https://medium.com/@thakermadhav/build-your-own-rag-with-mistral-7b-and-langchain-97d0c92fa146
https://medium.com/@talon8080/mastering-rag-chatbots-building-advanced-rag-as-a-conversational-ai-tool-with-langchain-d740493ff328

from llamasharp.

WesselvanGils commented on September 27, 2024

I'm trying to make basically exactly this right now. I got the BatchedExecutor figured out recently but now trying to integrate RAG into that pipeline is proving difficult. I wouldn't mind turning my final result into an example.

I'd like to verify an assumption as well: "When combining text generation and RAG in one application 3 model instances are needed, one for generating embeddings, one for retrieval generation and one for text generation". I feel like those last two instances could be one but I don't know if this to be possible because when creating KernelMemory a seperate model is instantiated.

I'm currently have this

using LLama.Native;
using LLamaSharp.KernelMemory;
using Microsoft.KernelMemory;
using Microsoft.KernelMemory.FileSystem.DevTools;
using Microsoft.KernelMemory.MemoryStorage.DevTools;

string nativePath = "<path to native llama>";
NativeLibraryConfig.Instance.WithLibrary(nativePath, null);

string generationModelPath = "<path to any LLM in GGUF format>";
string embeddingModelPath = "<path to any embedding model in GGUF format>";
string storageFolder = "<path to storage folder>";

var llamaGenerationConfig = new LLamaSharpConfig(generationModelPath);
var llamaEmbeddingConfig = new LLamaSharpConfig(embeddingModelPath);
var vectorDbConfig = new SimpleVectorDbConfig() { Directory = storageFolder, StorageType = FileSystemTypes.Disk };

var memory = new KernelMemoryBuilder()
    .WithLLamaSharpTextGeneration(llamaGenerationConfig)
    .WithLLamaSharpTextEmbeddingGeneration(llamaEmbeddingConfig)
    .WithSimpleVectorDb(vectorDbConfig)
    .Build();

Console.WriteLine("\n================== INGESTION ==================\n");

Console.WriteLine("Uploading text about E=mc^2");
await memory.ImportTextAsync("""
    In physics, mass–energy equivalence is the relationship between mass and energy 
    in a system's rest frame, where the two quantities differ only by a multiplicative
    constant and the units of measurement. The principle is described by the physicist
    Albert Einstein's formula: E = m*c^2
""");

Console.WriteLine("Uploading article file about Carbon");
await memory.ImportDocumentAsync("wikipedia.txt");

Console.WriteLine("\n================== RETRIEVAL ==================\n");

var question = "What's E = m*c^2?";
Console.WriteLine($"Question: {question}");

var answer = await memory.AskAsync(question);
Console.WriteLine($"\nAnswer: {answer.Result}\n\n  Sources:\n");

// Show sources / citations
foreach (var x in answer.RelevantSources)
{
    Console.WriteLine(x.SourceUrl != null
        ? $"  - {x.SourceUrl} [{x.Partitions.First().LastUpdate:D}]"
        : $"  - {x.SourceName}  - {x.Link} [{x.Partitions.First().LastUpdate:D}]");
}

I adapted this from this example on KernelMemory from Microsoft. But its current answer to everything is:

warn: Microsoft.KernelMemory.Search.SearchClient[0]
      No memories available

Answer: INFO NOT FOUND


  Sources:

Edit: I fixed this by removing the minRelevance parameter from AskAsync()

from llamasharp.

AsakusaRinne commented on September 27, 2024

I'd like to verify an assumption as well: "When combining text generation and RAG in one application 3 model instances are needed, one for generating embeddings, one for retrieval generation and one for text generation". I feel like those last two instances could be one but I don't know if this to be possible because when creating KernelMemory a seperate model is instantiated.

I agree that 3 models are needed, however I think the second one is actually not necessary to be a LLM. It could be an algorithm to find similarity of embeddings. Therefore the last two models is less likely to be merged into one.

TBH I'm not an expert of RAG, either. I think you will get a much better answer if you ask this question in kernel-memory issues. :)

Thank you a lot for looking into this issue!

from llamasharp.

AsakusaRinne commented on September 27, 2024

The example looks good. @xbotter Do you have any idea about further improve it?

from llamasharp.

Add a best practice example for RAG about llamasharp HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs