Comments (4)
Interesting, I would have hoped that the compiler uses constant propagation so as not to recompute this on all call, but rustc has quite a few shortcomings when it comes to float const functions that may prevent this from happening.
Would you mind trying to replace this with the hardcoded value for the thing and see if it has an impact? (and if that's the case, happy to get a PR for the change)
from candle.
fn crit(c: &mut Criterion) {
let arg = 20.0;
c.bench_function("gelu_f16", |c| {
c.iter(|| Gelu::f16(black_box(f16::from_f32(arg))))
});
}
leads to:
current implementation: 2.2172ns
below implementation: 1.6855ns
on my M1 mac.
I would expect that constant propagation is trickier on f16 as the compiler has to inline more to understand that it's safe.
This is the solution I used.
f16::from_f32_const((2.0 / PI).sqrt())
which does work with constant propgation. I can confirm that constant propagation does work on f32 and f64, validated on godbolt.
from candle.
Neat, would be great if you can make a PR for this (ideally with a comment explaining why it's useful). If you don't have the time I could take a stab at it.
from candle.
Made a PR for this, #2008
from candle.
Related Issues (20)
- 1.58 bit implementation HOT 5
- [help] Run gemma example using the command line on my mac, got some runtime errors. HOT 2
- Avoiding `.contiguous` call before `matmul` HOT 12
- How to use `topk`?
- best way to get `dims()` from a model?
- YoloV9 ONNX: unsupported op_type Identity HOT 1
- Recent revision for contiguous check has problems HOT 14
- Error When Running Stable-Diffusion Example HOT 3
- Writing up a "Contributing" guide HOT 3
- How to get different outputs for the same prompt? HOT 4
- How to run inference of a (very) large model across mulitple GPUs ? HOT 2
- does candle support nvidia 2080ti on windows 11? HOT 2
- Metavoice with quantized model - "Non contiguous rmsnorm is not implemented" - on M1 Max using metal HOT 2
- How to specify which graphics card to run a task on in a server with multiple graphics cards? HOT 2
- Flash attention 2 support ? HOT 2
- Could someone please explain why this is happening? (batcher.rs seq_len:4294967040) HOT 1
- The output diverges in comparison to the Python implementation. HOT 5
- Running models with different precisions HOT 8
- How to use CUDA as the backend in `candle-wasm-examples/llama2-c` ? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from candle.