GithubHelp home page GithubHelp logo

Comments (12)

AshD avatar AshD commented on June 25, 2024 1

Using this
var p = new ModelParams(modelPath) { GpuLayerCount = 0};

Same error. model_ptr returns IntPtr.Zero
var model_ptr = NativeApi.llama_load_model_from_file(modelPath, lparams);

Thanks,
Ash

from llamasharp.

AsakusaRinne avatar AsakusaRinne commented on June 25, 2024

There's a discussion in #357. It depends on llama.cpp implementation and we'll support it once llama.cpp supports it.

from llamasharp.

martindevans avatar martindevans commented on June 25, 2024

Relevant upstream issues:

from llamasharp.

AshD avatar AshD commented on June 25, 2024

I saw that the llama libraries were updated in the llamasharp repo and tried it.

Loading the weights took over a minute and 42 GB out of 128GB memory, 80% CPU, 28% GPU and then threw a native load failed exception.

Regards,
Ash

from llamasharp.

martindevans avatar martindevans commented on June 25, 2024

Could you link the GGUF file you were trying to use, I'll see if I can reproduce the problem.

from llamasharp.

AshD avatar AshD commented on June 25, 2024

Thanks @martindevans

GGUF file - https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/resolve/main/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf

These are my parameters

var p = new ModelParams(modelPath) { ContextSize = 4096, GpuLayerCount = 12,UseMemoryLock = true, UseMemorymap = true, Threads = 12,  BatchSize = 128, EmbeddingMode = true };
var w = LLamaWeights.LoadFromFile(p);

CPU - i9 13th Gen, GPU RTX 4090, 128GB

Thanks,
Ash

from llamasharp.

martindevans avatar martindevans commented on June 25, 2024

I'm downloading it now, but it's going to take a while!

However I've actually been testing with the Q5_K_M model from that same repo, so I'm expecting it to work.

I'd suggest trying out getting rid of all your params options there, most of them are automatically set and you shouldn't need to change them unless you have a good reason to override the defaults. The only one you do actually need to set is the GpuLayerCount, but I'd suggest setting that to zero as a test first.

from llamasharp.

martindevans avatar martindevans commented on June 25, 2024

I tested out that mode, but it seems to work perfectly for me on both CPU and GPU.

if NativeApi.llama_load_model_from_file is failing, that would normall indicate an error with the model file itself or something more fundamental. Have you tried this file with one of the llama.cpp demos?

from llamasharp.

AshD avatar AshD commented on June 25, 2024

Thanks @martindevans for your help in debugging this issue.

It works! The issue was it was picking up the llama dll from cuda11 folder and I assumed was picking it up from cuda11.7.1 folder.

I could offload 18 layers to the GPU. Token generation was around 7.5 tokens/sec.
Are you seeing similar numbers? Is a webpage that you are aware of that has the best parameters to set based on the model?

Model output was better than the Mistral Instruct v0.2 for some of the prompts I tried.

Thanks,
Ash

from llamasharp.

martindevans avatar martindevans commented on June 25, 2024

I'm using CPU inference, so it's slower for me. But as a rough guide it should be around the same speed as a 13B model.

Is a webpage that you are aware of that has the best parameters to set based on the model?

Almost all of the parameters should be automatically set (they're baked into the GGUF file).

The GPU layer count I don't know much about. As I understand it you just have to experiment to see how many layers you can fit and what speedup it gets you.

from llamasharp.

AshD avatar AshD commented on June 25, 2024

Thanks @martindevans

As you said, the GPU Layer count setting is more of - try to see how many you can fit in your GPU :-)

from llamasharp.

martindevans avatar martindevans commented on June 25, 2024

v0.9.1 added support for Mixtral, so I'll close this issue now.

from llamasharp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.