Comments (3)
That's a known limitation of the metal backend at the moment. @ivarflakstad has been looking at getting this to work but not sure what the current state is.
from candle.
I have this branch with working bfloat matmul. I'm testing running falcon on it now (downloading)
It is based on work I've done here which is not ready to be merged.
from candle.
If you have enough RAM you should be able to run Falcon on the candle branch I mentioned above.
Here I am running Mamba (130m) with bf16:
![Mamba bf16](https://private-user-images.githubusercontent.com/69173633/321749679-977be4d1-7f09-4a1b-a298-cab1e42e7ee1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEzODE0MzIsIm5iZiI6MTcyMTM4MTEzMiwicGF0aCI6Ii82OTE3MzYzMy8zMjE3NDk2NzktOTc3YmU0ZDEtN2YwOS00YTFiLWEyOTgtY2FiMWU0MmU3ZWUxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE5VDA5MjUzMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWMxZTk3ZDY5NWIxNWRhOTJmMmYwODMwNzhiNmNiNjMwMTU3ZmU1MjA0YjUxZTI1ZmYxNmM2NmQxMTA3ZGVhOTEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.X_D35HmMFFrGR6swXfFfxJTn1i3H1wU1g25dvqUKnqw)
from candle.
Related Issues (20)
- Qwen/Qwen2-7B doesn't work properly in the example qwen
- Gemma 2 support HOT 2
- Add `QTensor::quantize_onto` to remove a redundant dtoh copy?
- some token duplicated in candle-examples trocr
- Change single value of tensor HOT 3
- Indexing the first dim of a 2D tensor with a 1D tensor HOT 2
- How to get all layers attentions?
- Where should we document benchmarks? HOT 2
- Segmentation fault when using Whisper with Metal HOT 5
- use of undeclared crate or module `candle` HOT 3
- Does candle provide pytorch's torch.multinomial?
- tracking: support silero-vad v5 HOT 2
- bug on aarch64-apple-ios: Buffer Validation Illegal MTLStorageMode 0x10 HOT 19
- How to do freeze VarMap Vars?
- Compare 0-dimension tensor got CUDA_ERROR_INVALID_VALUE
- Example with simple chatbot? Blogs?
- Some Examples do not successfully build on older versions of CUDA whereby cuMemAdvise_v2 and cuMemPrefetchAsync_v2 are not present HOT 3
- Error in Moondream Example HOT 2
- how to use system prompt with the llama example? HOT 3
- `gradient_accumulation_steps` b2b HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from candle.