Comments (7)
That's correct, it requires the master
branch at the moment. We'll probably be releasing a new preview version soon (once #90 and #65 have been reviewed and merged)
from llamasharp.
Yes, it's running. Great work! Thx
from llamasharp.
I haven't tried it, but I believe 70B models should be supported on the 0.4.2 version at the moment.
I have now tried it and it doesn't work, sorry about that. Definitely something that needs looking into!
from llamasharp.
I did some more investigation into this to see what was required. Turns out the model I was testing with before was corrupt!
If you set GroupedQueryAttention = 8
in the model params you can load llama2 70B right now 🥳
from llamasharp.
Thx!
Which model did you use for testing? I tested it with the TheBloke/Llama-2-70B-Chat-GGML model, but it doesn't work.
var mp = new ModelParams(modelPath, contextSize: 1024, seed: 1337, gpuLayerCount: 128);
mp.GroupedQueryAttention = 8;
var interactiveExecutor = new InteractiveExecutor(new LLamaModel( mp ));
from llamasharp.
I used the q3_K_S version from TheBloke.
I just tested it again. Using the master
branch I modified the SaveAndLoadSession
demo to load the model like this:
var @params = new ModelParams(modelPath, contextSize: 1024, seed: 1337, gpuLayerCount: 5)
{
GroupedQueryAttention = 8,
};
InteractiveExecutor ex = new(new LLamaModel(@params));
And it works for me.
from llamasharp.
GroupedQueryAttention = 8,
is not available yet in nuget right? not in 0.4.2?
from llamasharp.
Related Issues (20)
- CentOS x86_64 Failed Loading 'libllama.so' HOT 4
- System.TypeInitializationException: 'The type initializer for 'LLama.Native.NativeApi' threw an exception.' HOT 12
- How do I continously print the answer word for word when using document ingestion with kernel memory? HOT 1
- How to rebuild LLamaSharp backends HOT 2
- Namespace should be consistent
- Mamba HOT 10
- Android Backend HOT 2
- [Feature] Allow async model loading and cancellation
- [CI] Add more unit test to ensure the the outputs are reasonable HOT 3
- Take multiple chat templates into account
- [Feature]: Support for Function Calling or Tools HOT 4
- [BUG]: DefragThreshold default is not matching llama.cpp and probably not intended HOT 6
- [BUG]: Answer stop abruptly after contextsize, even with limiting prompt size HOT 1
- [BUG]: Linux cuda version detection could be incorrect HOT 2
- [BUG]: WSL2 has problem running LLamaSharp with cuda11
- Add unit test about long context HOT 2
- Add debug mode of LLamaSharp
- How to better provide system information for LLMs HOT 3
- LLAVA Configuration HOT 4
- [Feature]: 不同的LLM模型,代码要以怎样的方式融合到项目里 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llamasharp.