I'm running the dinov2 example on CPU on a Cortex-A76 computer, except I've quantised

<div class="highlight highlight-source-rust notranslate position-relative overflow-auto" dir="auto"

Made a PR for this, <a class="issue-link js-issue-link" data-error-text="Failed to loa

inefficient implementation of gelu for fp16 about candle HOT 4 OPEN

j-baker commented on July 2, 2024

inefficient implementation of gelu for fp16

from candle.

Comments (4)

LaurentMazare commented on July 2, 2024

Interesting, I would have hoped that the compiler uses constant propagation so as not to recompute this on all call, but rustc has quite a few shortcomings when it comes to float const functions that may prevent this from happening.
Would you mind trying to replace this with the hardcoded value for the thing and see if it has an impact? (and if that's the case, happy to get a PR for the change)

from candle.

j-baker commented on July 2, 2024

fn crit(c: &mut Criterion) {
    let arg = 20.0;
    c.bench_function("gelu_f16", |c| {
        c.iter(|| Gelu::f16(black_box(f16::from_f32(arg))))
    });
}

leads to:

current implementation: 2.2172ns
below implementation: 1.6855ns

on my M1 mac.

I would expect that constant propagation is trickier on f16 as the compiler has to inline more to understand that it's safe.

This is the solution I used.

f16::from_f32_const((2.0 / PI).sqrt())

which does work with constant propgation. I can confirm that constant propagation does work on f32 and f64, validated on godbolt.

from candle.

LaurentMazare commented on July 2, 2024

Neat, would be great if you can make a PR for this (ideally with a comment explaining why it's useful). If you don't have the time I could take a stab at it.

from candle.

LaurentMazare commented on July 2, 2024

Made a PR for this, #2008

from candle.

Recommend Projects

inefficient implementation of gelu for fp16 about candle HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs