Do you plan that it will be possible to load 70b models?

Thx! Which model did you use for testing? I tested it with the <a href="https://hu

I used the <a href="https://cdn-lfs.huggingface.co/repos/fe/d7/fed75e74ade1c82e6c2c6f5

Will it be possible to load 70b models soon? about llamasharp HOT 7 CLOSED

ogre76 commented on July 24, 2024

Will it be possible to load 70b models soon?

from llamasharp.

Comments (7)

martindevans commented on July 24, 2024 2

That's correct, it requires the master branch at the moment. We'll probably be releasing a new preview version soon (once #90 and #65 have been reviewed and merged)

from llamasharp.

ogre76 commented on July 24, 2024 1

Yes, it's running. Great work! Thx

from llamasharp.

martindevans commented on July 24, 2024

~~I haven't tried it, but I believe 70B models should be supported on the 0.4.2 version at the moment.~~

~~I have now tried it and it doesn't work, sorry about that. Definitely something that needs looking into!~~

from llamasharp.

martindevans commented on July 24, 2024

I did some more investigation into this to see what was required. Turns out the model I was testing with before was corrupt!

If you set GroupedQueryAttention = 8 in the model params you can load llama2 70B right now 🥳

from llamasharp.

ogre76 commented on July 24, 2024

Thx!
Which model did you use for testing? I tested it with the TheBloke/Llama-2-70B-Chat-GGML model, but it doesn't work.

var mp = new ModelParams(modelPath, contextSize: 1024, seed: 1337, gpuLayerCount: 128);
mp.GroupedQueryAttention = 8;
var interactiveExecutor = new InteractiveExecutor(new LLamaModel( mp ));

from llamasharp.

martindevans commented on July 24, 2024

I used the q3_K_S version from TheBloke.

I just tested it again. Using the master branch I modified the SaveAndLoadSession demo to load the model like this:

var @params = new ModelParams(modelPath, contextSize: 1024, seed: 1337, gpuLayerCount: 5)
{
    GroupedQueryAttention = 8,
};

InteractiveExecutor ex = new(new LLamaModel(@params));

And it works for me.

from llamasharp.

KSemenenko commented on July 24, 2024

GroupedQueryAttention = 8,
is not available yet in nuget right? not in 0.4.2?

from llamasharp.

Recommend Projects

Will it be possible to load 70b models soon? about llamasharp HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs