GithubHelp home page GithubHelp logo

Comments (9)

lemire avatar lemire commented on June 2, 2024

Is the tradeoff worth it?

Please grab https://github.com/lemire/simple_fastfloat_benchmark I believe that it is a reasonable benchmark.

Do cmake -B build && cmake --build build && ./run_bench.sh

from fast_float.

lemire avatar lemire commented on June 2, 2024

The differences in your benchmarks appear to reach ~40 cycles for short inputs. It seems unlikely that checking for the presence of eight-digits using a fast path, which might in this case be a simple length comparison ("you can't even access 8 bytes, so don't even do the check"), would cost that many cycles.

Please take the same library and change just one thing, one path, and measure the difference in performance

from fast_float.

mwalcott3 avatar mwalcott3 commented on June 2, 2024

@lemire My problem is the tests are all focused on high numbers of digits.

Take canada.txt for instance if you count the digits in each number this is the distribution
{17: 100717, 5: 28, 8: 635, 16: 7811, 2: 36, 3: 28, 15: 350, 9: 1384, 7: 50, 4: 42, 6: 45}
Over 90% of the numbers have 17 digits. I don’t know about the other tests but uniform random doubles will tend to be 17-16 sig figs when written out in a roundtripable form.

If I shorten the digits in canada.txt by writing them out in fixed notation with 2 decimal places. There is a huge perf reduction compared to your earlier fast_double_parser on short floats with small numbers of sig figs. This is the file I used for the test canada_short.txt

You are right im seeing very little difference on low sig figs removing the SWAR stuff so its probably not that.

Benchmark was compiled with gcc and ran on a i5-1135G7

Current impl

[mwalcott@fedora simple_fastfloat_benchmark]$ ./build/benchmarks/benchmark -f data/canada.txt 
# read 111126 lines 
volume = 1.93374 MB 
netlib                                  :   339.16 MB/s (+/- 1.9 %)    19.49 Mfloat/s      31.32 i/B   571.45 i/f (+/- 0.0 %)      0.15 bm/B     2.78 bm/f (+/- 1.3 %)     11.76 c/B   214.66 c/f (+/- 0.9 %)      2.66 i/c      4.18 GHz 
doubleconversion                        :   320.10 MB/s (+/- 3.4 %)    18.40 Mfloat/s      51.16 i/B   933.48 i/f (+/- 0.0 %)      0.05 bm/B     0.83 bm/f (+/- 4.6 %)     12.47 c/B   227.48 c/f (+/- 2.1 %)      4.10 i/c      4.18 GHz 
strtod                                  :   195.83 MB/s (+/- 1.2 %)    11.25 Mfloat/s      70.30 i/B  1282.83 i/f (+/- 0.0 %)      0.15 bm/B     2.80 bm/f (+/- 0.8 %)     20.38 c/B   371.82 c/f (+/- 0.9 %)      3.45 i/c      4.18 GHz 
abseil                                  :   487.79 MB/s (+/- 1.9 %)    28.03 Mfloat/s      30.17 i/B   550.47 i/f (+/- 0.0 %)      0.03 bm/B     0.60 bm/f (+/- 0.5 %)      8.18 c/B   149.25 c/f (+/- 1.3 %)      3.69 i/c      4.18 GHz 
fastfloat                               :  1053.34 MB/s (+/- 1.9 %)    60.53 Mfloat/s      15.58 i/B   284.27 i/f (+/- 0.0 %)      0.01 bm/B     0.11 bm/f (+/- 7.9 %)      3.79 c/B    69.14 c/f (+/- 1.5 %)      4.11 i/c      4.19 GHz 
fast_double_parser                      :  1139.63 MB/s (+/- 1.4 %)    65.49 Mfloat/s      12.64 i/B   230.56 i/f (+/- 0.0 %)      0.01 bm/B     0.11 bm/f (+/- 0.3 %)      3.50 c/B    63.90 c/f (+/- 0.9 %)      3.61 i/c      4.18 GHz 
[mwalcott@fedora simple_fastfloat_benchmark]$ ./build/benchmarks/benchmark -f data/canada_short.txt 
# read 111126 lines 
volume = 0.598098 MB 
netlib                                  :   464.83 MB/s (+/- 8.3 %)    86.37 Mfloat/s      34.90 i/B   196.98 i/f (+/- 0.0 %)      0.03 bm/B     0.15 bm/f (+/- 1.3 %)      8.58 c/B    48.45 c/f (+/- 0.8 %)      4.07 i/c      4.18 GHz 
doubleconversion                        :   284.03 MB/s (+/- 3.3 %)    52.77 Mfloat/s      66.27 i/B   373.98 i/f (+/- 0.0 %)      0.02 bm/B     0.11 bm/f (+/- 1.7 %)     14.04 c/B    79.25 c/f (+/- 2.6 %)      4.72 i/c      4.18 GHz 
strtod                                  :   121.46 MB/s (+/- 1.3 %)    22.57 Mfloat/s     129.04 i/B   728.24 i/f (+/- 0.0 %)      0.12 bm/B     0.66 bm/f (+/- 0.8 %)     32.86 c/B   185.46 c/f (+/- 1.0 %)      3.93 i/c      4.19 GHz 
abseil                                  :   205.37 MB/s (+/- 1.3 %)    38.16 Mfloat/s      70.69 i/B   398.96 i/f (+/- 0.0 %)      0.08 bm/B     0.46 bm/f (+/- 0.7 %)     19.43 c/B   109.65 c/f (+/- 0.9 %)      3.64 i/c      4.18 GHz 
fastfloat                               :   489.87 MB/s (+/- 2.0 %)    91.02 Mfloat/s      34.52 i/B   194.84 i/f (+/- 0.0 %)      0.01 bm/B     0.04 bm/f (+/- 1.2 %)      8.15 c/B    45.98 c/f (+/- 1.3 %)      4.24 i/c      4.19 GHz 
fast_double_parser                      :  1239.27 MB/s (+/- 5.7 %)   230.26 Mfloat/s      16.06 i/B    90.65 i/f (+/- 0.0 %)      0.00 bm/B     0.00 bm/f (+/- 1.5 %)      3.22 c/B    18.20 c/f (+/- 3.6 %)      4.98 i/c      4.19 GHz 

Removing the 8 char SWAR stuff.

[mwalcott@fedora simple_fastfloat_benchmark]$ ./build/benchmarks/benchmark -f data/canada.txt 
# read 111126 lines 
volume = 1.93374 MB 
netlib                                  :   335.22 MB/s (+/- 1.5 %)    19.26 Mfloat/s      31.32 i/B   571.45 i/f (+/- 0.0 %)      0.15 bm/B     2.71 bm/f (+/- 0.7 %)     11.90 c/B   217.19 c/f (+/- 0.6 %)      2.63 i/c      4.18 GHz 
doubleconversion                        :   329.30 MB/s (+/- 1.7 %)    18.92 Mfloat/s      51.16 i/B   933.48 i/f (+/- 0.0 %)      0.03 bm/B     0.62 bm/f (+/- 0.6 %)     12.12 c/B   221.08 c/f (+/- 1.4 %)      4.22 i/c      4.18 GHz 
strtod                                  :   194.08 MB/s (+/- 1.5 %)    11.15 Mfloat/s      70.30 i/B  1282.83 i/f (+/- 0.0 %)      0.15 bm/B     2.81 bm/f (+/- 0.8 %)     20.56 c/B   375.14 c/f (+/- 1.3 %)      3.42 i/c      4.18 GHz 
abseil                                  :   481.38 MB/s (+/- 2.6 %)    27.66 Mfloat/s      30.17 i/B   550.47 i/f (+/- 0.0 %)      0.04 bm/B     0.67 bm/f (+/- 9.9 %)      8.29 c/B   151.23 c/f (+/- 2.3 %)      3.64 i/c      4.18 GHz 
fastfloat                               :   811.33 MB/s (+/- 1.1 %)    46.62 Mfloat/s      17.70 i/B   322.90 i/f (+/- 0.0 %)      0.01 bm/B     0.10 bm/f (+/- 0.2 %)      4.92 c/B    89.73 c/f (+/- 0.6 %)      3.60 i/c      4.18 GHz
[mwalcott@fedora simple_fastfloat_benchmark]$ ./build/benchmarks/benchmark -f data/canada_short.txt 
# read 111126 lines 
volume = 0.598098 MB 
netlib                                  :   460.80 MB/s (+/- 6.7 %)    85.62 Mfloat/s      34.90 i/B   196.98 i/f (+/- 0.0 %)      0.03 bm/B     0.16 bm/f (+/- 1.3 %)      8.66 c/B    48.87 c/f (+/- 0.7 %)      4.03 i/c      4.18 GHz 
doubleconversion                        :   279.87 MB/s (+/- 3.2 %)    52.00 Mfloat/s      66.27 i/B   373.98 i/f (+/- 0.0 %)      0.02 bm/B     0.11 bm/f (+/- 3.0 %)     14.26 c/B    80.47 c/f (+/- 2.4 %)      4.65 i/c      4.18 GHz 
strtod                                  :   121.77 MB/s (+/- 1.6 %)    22.63 Mfloat/s     129.04 i/B   728.24 i/f (+/- 0.0 %)      0.12 bm/B     0.67 bm/f (+/- 0.5 %)     32.76 c/B   184.86 c/f (+/- 1.4 %)      3.94 i/c      4.18 GHz 
abseil                                  :   205.98 MB/s (+/- 1.3 %)    38.27 Mfloat/s      70.69 i/B   398.96 i/f (+/- 0.0 %)      0.08 bm/B     0.46 bm/f (+/- 0.5 %)     19.38 c/B   109.35 c/f (+/- 0.9 %)      3.65 i/c      4.19 GHz 
fastfloat                               :   506.06 MB/s (+/- 3.1 %)    94.03 Mfloat/s      32.75 i/B   184.84 i/f (+/- 0.0 %)      0.01 bm/B     0.04 bm/f (+/- 1.6 %)      7.89 c/B    44.51 c/f (+/- 2.8 %)      4.15 i/c      4.18 GHz 

canada_short.txt

from fast_float.

mwalcott3 avatar mwalcott3 commented on June 2, 2024

Why is Clinger's fast path only being applied to positive exponents? I feel like all the numbers I'm using in canada_short should be able to fast path (Small significand and an exponent of -2).

Edit:
Nvm saw issue #149. That is incredibly annoying. Changing the floating point rounding mode and not resetting it seems almost like the user is asking for problems.

from fast_float.

lemire avatar lemire commented on June 2, 2024

Changing the floating point rounding mode and not resetting it seems almost like the user is asking for problems.

Agreed but the C++ specification is what it is.

from fast_float.

lemire avatar lemire commented on June 2, 2024

Thanks for the extra file, I have added it to my benchmark.

Looking at your numbers, we have the following numbers of instructions per float for fast_float: 184.84 i/f and 194.84 i/f (canada_short.txt) and 322.90 i/f and 284.27 i/f (canada_short.txt). You add 10 instructions on the one hand, and you save 39 instructions on the other hand.

It depends what you favour. The canada file is derived from actual data. It is unclear to me which application would match the canada_short file. Is that the sort of data you encounter in your work?

I'd be most interested in real-world reports. Please note that we always try to optimize for the data people do have.

I am tuning the performance based on additional test which take into account canada_short. Note that I weight canada_short somewhat less because I consider it less likely to be realistic.

See #152

from fast_float.

mwalcott3 avatar mwalcott3 commented on June 2, 2024

Is that the sort of data you encounter in your work?

Rarely, it's a bit of a contrived example. Some old Fortran codes I interface with output ascii files with short fixed point decimal numbers (In that case noone really cares about 1ulp errors or accuracy in general so maybe fast_float is overkill). For the most part I expect to deal with numbers closer to what is seen in canada.txt.

I just thought that simple low digit fixed point performance should be tested because it sometimes pops up and there appears to be a performance regression there compared to fast_double_parser.

fastfloat                               :   489.87 MB/s (+/- 2.0 %)    91.02 Mfloat/s      34.52 i/B   194.84 i/f (+/- 0.0 %)      0.01 bm/B     0.04 bm/f (+/- 1.2 %)      8.15 c/B    45.98 c/f (+/- 1.3 %)      4.24 i/c      4.19 GHz 
fast_double_parser                      :  1239.27 MB/s (+/- 5.7 %)   230.26 Mfloat/s      16.06 i/B    90.65 i/f (+/- 0.0 %)      0.00 bm/B     0.00 bm/f (+/- 1.5 %)      3.22 c/B    18.20 c/f (+/- 3.6 %)      4.98 i/c      4.19 GHz 

But it appears that appears to primarily be related to other issues. For instance, re-enabling the fast path for negative exponents (not standard compliant) immediately saw a significant perf boost in performance when parsing the short numbers. I think it was close to 50%. But that's not something that can really be changed.

from fast_float.

lemire avatar lemire commented on June 2, 2024

For instance, re-enabling the fast path for negative exponents

Right. There might be room for more clever approaches. Unfortunately, testing for the rounding mode each time is too expensive.

I will close this issue for now. It would be very helpful to give more thought to ways around the performance regression you allude to. Of course, we could simply make it a compile-time option, or even a runtime option. But it is less useful than it sounds because the default would have to be standard compliance... sadly.

from fast_float.

lemire avatar lemire commented on June 2, 2024

Thanks for the report.

from fast_float.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.