Hi everyone, I have a question: is it possible to speed-up the runtime when using comp

Hi. Yes, I would need saturation. Maybe i will

Poor performance when using composition of core numeric types about cnl HOT 12 CLOSED

emiliopaolini commented on June 12, 2024

Poor performance when using composition of core numeric types

from cnl.

Comments (12)

johnmcfarlane commented on June 12, 2024

Hi. I can only guess without seeing the code but you're using most of the facilities of the library and some of them incur run-time costs of one kind or another.

Are you sure saturation is what you want? It often produces completely wrong results: it's not 'sticky' like floating-point saturation. Can you not simply avoid overflow instead? Try using the trapping tag instead and see if there is actually any overflow and then see why that is.
How many digits are actually being used? If it's greater than 63, you may be invoking multi-word wide_integer types. If it's over 127, you almost certainly are.
I have to ask: do you have optimisations enabled, in particular inlining?

from cnl.

emiliopaolini commented on June 12, 2024

Hi.

Yes, I would need saturation. Maybe i will try to use the overflow integer, and just let the program run without it to see if there are any improvements.
I am actually using this: using float_t = saturated_elastic_scaled_integer<5, -4>;
Yes, i have optimizations enabled!

Do you have any other suggestions? Maybe i can try to avoid using wide_integer too? Thank you very much!!

from cnl.

johnmcfarlane commented on June 12, 2024

With numbers that small, wide_integer shouldn't affect things although if your compiler is unable to inline code aggressively enough, it might get in the way of optimisation. You certainly shouldn't need it. You can test this by removing it and seeing if there are compiler errors.

But really, saturated_overflow_tag is probably your biggest problem. Try replacing with trapping_overflow_tag. If that runs without error, try undefined_overflow_tag to see if that improves performance. Currently, you're testing many operations for overflow -- many of which cannot overflow and all of which arguably shouldn't overflow. When overflow occurs, errors of arbitrary scale creep in, often accumulating over time.

from cnl.

johnmcfarlane commented on June 12, 2024

Can I ask if you're emulating saturating hardware? If so, does every last operation saturate?

from cnl.

emiliopaolini commented on June 12, 2024

Hi, I did some tests and removing overflow_integer made the simulations run faster. Now it requires more or less 30 minutes! So it is a good time.
Regarding the hardware, yes. I am trying to emulate a specific hardware and I would need every operation that i perform to stay on 6 bits let's say. That's why i was using overflow_integer with the saturated_overflow_tag. Are there any alternatives?

from cnl.

johnmcfarlane commented on June 12, 2024

When you use float, how is are the values being saturated?

from cnl.

emiliopaolini commented on June 12, 2024

Maybe it is better if i explain the scenario on which i am working. I am currently using the types provided by CNL inside Tiny-DNN, that is a library for deep learning in C++. I am training my models using float and then i have to perform the inference using fixed-point (to simulate the hardware, since for now it is only possible to do inference and not training). So i am not saturating when i use float, instead when i use fixed-point i would need the values to saturate.

from cnl.

johnmcfarlane commented on June 12, 2024

Try to use types with just enough range to always avoid overflow and reduce incidence of conversion between types with different scales. For example, if b has non-zero exponent, the following incurs a scaling operation:

a = a * b;

To that end, use auto to try and avoid unnecessary conversion:

auto c = a * b;  // guaranteed not to scale

Division can be more expensive for integers than for floats. So try and produce an inverse against which you can multiply many values. If that inverse has the right exponent, you will also avoid a scaling operation and get much more efficient code.

Think about other ways to avoid expensive calculations like replacing sqrt(distance) < radius with distance*distance < radius*radius. Remember that elastic_integer helps you avoid the overflow by widening which is generally cheaper than testing for overflow and far more accurate than saturating.

Perhaps if you can extract the inner loop and get it building on Compiler Explorer (example). If it's a simple piece of code, I'd be happy to take a quick look. Otherwise, it's still a lot of guesswork.

Remember you're using a fraction of the silicon. The downside is that much of the burden for handling range rests with you.

from cnl.

emiliopaolini commented on June 12, 2024

I think i will go with elastic_integer! Thank you for all the suggestions!!

from cnl.

johnmcfarlane commented on June 12, 2024

YW Bear in mind the width of results will exceed the width of operands for operations like multiply and add.

…

On Thu 15 Jul 2021, 07:28 emiliopaolini, ***@***.***> wrote: I think i will go with elastic_integer! Thank you for all the suggestions!! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#913 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFTGN7ZF65ZTAJIR2CIRATTXZ5Z3ANCNFSM5AG6XSOA> .

from cnl.

emiliopaolini commented on June 12, 2024

Thank you! But for example, if i multiply two numbers and then assign the result to an elastic_scaled_integer of 6 bits, what happens? Will the result be converted to satisfy the constraint of 6 bits and it will lose resolution right? Il giorno gio 15 lug 2021 alle ore 09:40 John McFarlane < ***@***.***> ha scritto:

…

YW Bear in mind the width of results will exceed the width of operands for operations like multiply and add. On Thu 15 Jul 2021, 07:28 emiliopaolini, ***@***.***> wrote: > I think i will go with elastic_integer! Thank you for all the suggestions!! > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#913 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AAFTGN7ZF65ZTAJIR2CIRATTXZ5Z3ANCNFSM5AG6XSOA > > . > — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#913 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIQEAHXIAC3LUV67LEO5B7TTX2GF7ANCNFSM5AG6XSOA> .

from cnl.

johnmcfarlane commented on June 12, 2024

Yes, using the simplest conversion, it will scale up or down as necessary. The lower digits will be lost, i.e. precision loss. (That's kind-of inevitable.) If the number was too big to fit, you will get erroneous results.

One other thing that might help performance: use overflow_integer with its default tag (undefined) and in an optimised build, define NDEBUG. This tells the optimiser to assume there's no overflow.

Example: https://godbolt.org/z/sxYqvjanv

from cnl.

Poor performance when using composition of core numeric types about cnl HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs