While running <a href="https://github.com/SciSharp/LLamaSharp/blob/master/LLama.Exampl

Could you try running this: <div class="highlight highlight-source-cs notranslate

In the <a class="issue-link js-issue-link" data-error-text="Failed to load title" data

<a class="commit-link" href="https://github.com/SciSharp/LLamaSharp/commit/c325ac91279

IndexOutOfRangeException when calling IKernelMemory.AskAsync() about llamasharp HOT 9 OPEN

WesselvanGils commented on September 26, 2024

IndexOutOfRangeException when calling IKernelMemory.AskAsync()

from llamasharp.

Comments (9)

martindevans commented on September 26, 2024

Could you try running this:

var model = LLamaWeights.LoadFromFile("your_model_path");
Console.WriteLine(model.NewlineToken);

The code that's crashing is this:

var nl_token = model.NewlineToken;
var nl_logit = logits[(int)nl_token];

So it seems like your model is probably returning something unexpected for the newline token.

from llamasharp.

WesselvanGils commented on September 26, 2024

I see, it's returning -1, that explains the IndexOutOfRange. Is this an issue with the model itself?

from llamasharp.

martindevans commented on September 26, 2024

I'm not certain, but it doesn't seem correct for any model to be returning -1 for the newline token. That would mean the model has no concept of newlines, which is pretty bizarre!

If other quantizations of the same model are returning other values and it's just the f32 one that's returning -1 I would say that's certainly an error in f32.

from llamasharp.

WesselvanGils commented on September 26, 2024

I'm not sure on this yet but, not having a newline token seems to be a commonality for embedding models. For nomic I tested F32, F16 and Q2_K, I then also tried this model and they all return -1 for their newline token.

from llamasharp.

martindevans commented on September 26, 2024

If multiple models are showing the same thing I guess that must be normal. Very weird!

In that case I think the NewlineToken method should be updated to return LLamaToken? instead of LLamaToken and all callsites fixed to handle that sometimes being null.

from llamasharp.

WesselvanGils commented on September 26, 2024

It could be this is intended behavoir. The models I've been testing are models for generating embeddings so it makes sense that they don't have a newline token as they are never expected to generate text.
Doing this

var memory = new KernelMemoryBuilder()
    .WithLLamaSharpTextGeneration(llamaGenerationConfig)
    .WithLLamaSharpTextEmbeddingGeneration(llamaEmbeddingConfig)

Resolves the issue, using the embedding model to generate the embeddings and a regular model to generate the output.
WithLLamaSharpDefaults assumes a regular model which is capable of both.

from llamasharp.

martindevans commented on September 26, 2024

In the #662 PR I've modified how tokens are returned from the LLamaSharp API so it returns nullable tokens and fixed all of the call sites to handle this. I think your approach there is the right one though.

from llamasharp.

zsogitbe commented on September 26, 2024

c325ac9#commitcomment-141108660

from llamasharp.

psampaio commented on September 26, 2024

The same issue happens when using the SemanticKernel integration using the ITextGenerationService, with an embedding model (nomic).

from llamasharp.

IndexOutOfRangeException when calling IKernelMemory.AskAsync() about llamasharp HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs