GithubHelp home page GithubHelp logo

Comments (5)

WesselvanGils avatar WesselvanGils commented on September 27, 2024 1

I did actually manage to figure this out with semantic memory. I'll put a proper example for that version together tomorrow. The advantage of that over the solution above is that it actually just returns context using cosine similarity on embeddings so you can utilize any executor just by adding the context to the prompt.

from llamasharp.

WesselvanGils avatar WesselvanGils commented on September 27, 2024 1
using LLama;
using LLama.Common;
using LLama.Native;
using LLamaSharp.SemanticKernel.TextEmbedding;
using Microsoft.SemanticKernel.Connectors.Sqlite;
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Text;
using System.Text;

// Initialize native library before anything else
string llamaPath = Path.GetFullPath("<path to local lib>/libllama.so");
NativeLibraryConfig.Instance.WithLibrary(llamaPath, null);

// Download a document and create embeddings for it
#pragma warning disable SKEXP0050, SKEXP0001, SKEXP0020

var embeddingModelPath = Path.GetFullPath("<path to embed model>/nomic-embed.gguf");
var embeddingParameters = new ModelParams(embeddingModelPath) { ContextSize = 4096, GpuLayerCount = 13, Embeddings = true };
var embeddingWeights = LLamaWeights.LoadFromFile(embeddingParameters);
var embedder = new LLamaEmbedder(embeddingWeights, embeddingParameters);

var service = new LLamaSharpEmbeddingGeneration(embedder);

ISemanticTextMemory memory = new MemoryBuilder()
    .WithMemoryStore(await SqliteMemoryStore.ConnectAsync("mydata.db"))
    .WithTextEmbeddingGeneration(service)
    .Build();

Console.WriteLine("===== INGESTING =====");

IList<string> collections = await memory.GetCollectionsAsync();

string folderPath = Path.GetFullPath("<path to folder>/Embeddings");
string[] files = Directory.GetFiles(folderPath);

string collectionName = "TestCollection";

if (collections.Contains(collectionName))
{
    Console.WriteLine("Found database");
}
else
{
    foreach (var item in files.Select((path, index) => new { path, index }))
    {
        Console.WriteLine($"Ingesting file #{item.index}");
        string text = File.ReadAllText(item.path);
        var paragraphs = TextChunker.SplitPlainTextParagraphs(TextChunker.SplitPlainTextLines(text, 128), 512);

        foreach (var para in paragraphs.Select((text, index) => new { text, index } ))
            await memory.SaveInformationAsync(collectionName, para.text, $"Document {item.path}, Paragraph {para.index}");
    }

    Console.WriteLine("Generated database");
}
Console.WriteLine("===== DONE INGESTING =====");

StringBuilder builder = new();

Console.Write("Question: ");
string question = Console.ReadLine()!;
builder.Clear();

Console.WriteLine("===== RETRIEVING =====");

List<string> sources = [];
await foreach (var result in memory.SearchAsync(collectionName, question, limit: 1, minRelevanceScore: 0))
{
    builder.AppendLine(result.Metadata.Text);
    sources.Add(result.Metadata.Id);
}

builder.AppendLine("""

Sources:
""");

foreach (string source in sources)
{
    builder.AppendLine($"    {source}");
}
Console.WriteLine("===== DONE RETRIEVING =====");

Console.WriteLine(builder.ToString());

#pragma warning restore SKEXP0001, SKEXP0050, SKEXP0020

We have to supress some warnings here because semantic memory is technically considered experimental. This just uses LLamaSharp to generate embeddings and allows us to search anything compatible with Semantic Memory with those embeddings returning the most relevant text chunks. This doesn't do any generation so you'd have to add the context to the prompt manually.

Some things to consider is that this is generally the fist step of RAG and there are a lot of steps you can add in between this and adding it to the prompt. Such as returning multiple sources and reranking them, summirzation and so on. I'll leave some helpful resources as well:
https://github.com/pchunduri6/rag-demystified
https://medium.com/@thakermadhav/build-your-own-rag-with-mistral-7b-and-langchain-97d0c92fa146
https://medium.com/@talon8080/mastering-rag-chatbots-building-advanced-rag-as-a-conversational-ai-tool-with-langchain-d740493ff328

from llamasharp.

WesselvanGils avatar WesselvanGils commented on September 27, 2024

I'm trying to make basically exactly this right now. I got the BatchedExecutor figured out recently but now trying to integrate RAG into that pipeline is proving difficult. I wouldn't mind turning my final result into an example.

I'd like to verify an assumption as well: "When combining text generation and RAG in one application 3 model instances are needed, one for generating embeddings, one for retrieval generation and one for text generation". I feel like those last two instances could be one but I don't know if this to be possible because when creating KernelMemory a seperate model is instantiated.

I'm currently have this

using LLama.Native;
using LLamaSharp.KernelMemory;
using Microsoft.KernelMemory;
using Microsoft.KernelMemory.FileSystem.DevTools;
using Microsoft.KernelMemory.MemoryStorage.DevTools;

string nativePath = "<path to native llama>";
NativeLibraryConfig.Instance.WithLibrary(nativePath, null);

string generationModelPath = "<path to any LLM in GGUF format>";
string embeddingModelPath = "<path to any embedding model in GGUF format>";
string storageFolder = "<path to storage folder>";

var llamaGenerationConfig = new LLamaSharpConfig(generationModelPath);
var llamaEmbeddingConfig = new LLamaSharpConfig(embeddingModelPath);
var vectorDbConfig = new SimpleVectorDbConfig() { Directory = storageFolder, StorageType = FileSystemTypes.Disk };

var memory = new KernelMemoryBuilder()
    .WithLLamaSharpTextGeneration(llamaGenerationConfig)
    .WithLLamaSharpTextEmbeddingGeneration(llamaEmbeddingConfig)
    .WithSimpleVectorDb(vectorDbConfig)
    .Build();

Console.WriteLine("\n================== INGESTION ==================\n");

Console.WriteLine("Uploading text about E=mc^2");
await memory.ImportTextAsync("""
    In physics, mass–energy equivalence is the relationship between mass and energy 
    in a system's rest frame, where the two quantities differ only by a multiplicative
    constant and the units of measurement. The principle is described by the physicist
    Albert Einstein's formula: E = m*c^2
""");

Console.WriteLine("Uploading article file about Carbon");
await memory.ImportDocumentAsync("wikipedia.txt");

Console.WriteLine("\n================== RETRIEVAL ==================\n");

var question = "What's E = m*c^2?";
Console.WriteLine($"Question: {question}");

var answer = await memory.AskAsync(question);
Console.WriteLine($"\nAnswer: {answer.Result}\n\n  Sources:\n");

// Show sources / citations
foreach (var x in answer.RelevantSources)
{
    Console.WriteLine(x.SourceUrl != null
        ? $"  - {x.SourceUrl} [{x.Partitions.First().LastUpdate:D}]"
        : $"  - {x.SourceName}  - {x.Link} [{x.Partitions.First().LastUpdate:D}]");
}

I adapted this from this example on KernelMemory from Microsoft. But its current answer to everything is:

warn: Microsoft.KernelMemory.Search.SearchClient[0]
      No memories available

Answer: INFO NOT FOUND


  Sources:

Edit: I fixed this by removing the minRelevance parameter from AskAsync()

from llamasharp.

AsakusaRinne avatar AsakusaRinne commented on September 27, 2024

I'd like to verify an assumption as well: "When combining text generation and RAG in one application 3 model instances are needed, one for generating embeddings, one for retrieval generation and one for text generation". I feel like those last two instances could be one but I don't know if this to be possible because when creating KernelMemory a seperate model is instantiated.

I agree that 3 models are needed, however I think the second one is actually not necessary to be a LLM. It could be an algorithm to find similarity of embeddings. Therefore the last two models is less likely to be merged into one.

TBH I'm not an expert of RAG, either. I think you will get a much better answer if you ask this question in kernel-memory issues. :)

Thank you a lot for looking into this issue!

from llamasharp.

AsakusaRinne avatar AsakusaRinne commented on September 27, 2024

The example looks good. @xbotter Do you have any idea about further improve it?

from llamasharp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.