Comments (1)
Compilation can be very long for the cuda kernels flash attn (and easily runs out of memory too). More than 10 minutes wouldn't be surprising. Do you see anything in top / ps (nvcc, cicc, ...)?
Also you probably want to set the CANDLE_FLASH_ATTN_BUILD_DIR
environment variable to something like $HOME/.candle
so that the kernel compilation doesn't trigger too often.
from candle.
Related Issues (20)
- load_image and load_image_and_resize return different permutations
- How to get raw tensor data? HOT 2
- Qwen/Qwen2-7B doesn't work properly in the example qwen
- Gemma 2 support HOT 2
- Add `QTensor::quantize_onto` to remove a redundant dtoh copy?
- some token duplicated in candle-examples trocr
- Change single value of tensor HOT 3
- Indexing the first dim of a 2D tensor with a 1D tensor HOT 2
- How to get all layers attentions?
- Where should we document benchmarks? HOT 2
- Segmentation fault when using Whisper with Metal HOT 5
- use of undeclared crate or module `candle` HOT 3
- Does candle provide pytorch's torch.multinomial?
- tracking: support silero-vad v5 HOT 2
- bug on aarch64-apple-ios: Buffer Validation Illegal MTLStorageMode 0x10 HOT 19
- How to do freeze VarMap Vars?
- Compare 0-dimension tensor got CUDA_ERROR_INVALID_VALUE
- Example with simple chatbot? Blogs? HOT 1
- Some Examples do not successfully build on older versions of CUDA whereby cuMemAdvise_v2 and cuMemPrefetchAsync_v2 are not present HOT 3
- Error in Moondream Example HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from candle.