Comments (11)
I believe that the state-of-the-art is Schubfach algorithm's but I did not find a C++ implementation that I liked. In simdjson, we adopted Grisu2 which is not as good, but I could find good looking C++ implementations.
So I think that taking Schubfach, building a good implementation, testing it, tuning it, would be great. Note that there might be a good Schubfach implementation out there in C++, I just did not find one.
(For obvious reasons, when you are building software, you don't just want to use something that has the best algorithm. You have other constraints... like... can I trust the code not to blow up? Can I read through the code and understand it?)
I do not care about the roundtrip guarantee dictated by the standard.
Actually, this should come for free. The from_chars implemented in fast_float is exact (with round-to-even and all that) so if you have exact to_chars, then you get the round-trip for free. In fact, you get better.
from fast_float.
There's a Schubfach implementation here:
https://github.com/jk-jeon/fp/tree/master/subproject/3rdparty/schubfach
But am I correct in thinking Schubfach will only give us the equivalent to printf("%g")
? I'd also like to have %e
and %f
together with a specified precision.
from fast_float.
Schubfach is the high-level algorithm and not a formatter per se, so you are correct that it does not do everything (nor is it meant to).
I have not looked at the pointer you give but it does look a good APL 2 library at a glance.
Let us look at the std::to_chars
specification... So if we are just talking about std::to_chars
, then you always want the shortest representation, though you need to support both f
and e
.
the value is converted to a string as if by std::printf in the default ("C") locale. The conversion specifier is f or e (resolving in favor of f in case of a tie), chosen according to the requirement for a shortest representation: the string representation consists of the smallest number of characters such that there is at least one digit before the radix point (if present) and parsing the representation using the corresponding std::from_chars function recovers value exactly. If there are several such representations, one with the smallest difference to value is chosen, resolving any remaining ties using rounding according to std::round_to_nearest
from fast_float.
On a related note, I've gathered the first benchmark results here.
Overall, fast_float is really fast on windows: 4x faster than std::from_chars()
, and faster than everything else.
On Linux, it's among the faster, but there are some outliers and I have some suspicion over the results (eg, for clang10/Release/double, std::atof()
is ~870MB/s, compared with fast_float at ~360MB/s). To be clear this is WSL so let's not jump to conclusions.
I do have some concerns over binary size. If you look at the data on the linux sizes, fast float is above 1.3MB, while a scanf is 12KB; even iostream has a smaller size, at ~1.2MB. To make things more comparable, I tried to request the static standard library, but I had no time to check if that was successful.
So, something to look at.
(And apologies if this is not the place to post such data.)
from fast_float.
If you look at the data on the linux sizes, fast float is above 1.3MB
It is a header-only library, but let us look at the size of the compiled binaries (which include the header, compiled in release mode with -O3):
$ ls -alh example_test
-rwxr-xr-x 1 lemire dialout 35K Nov 12 01:23 example_test
Now an empty "int main() {}" binary will use 17KB. So fast_float itself cannot be much more than about ~15KB in that case. It may be a bit more, I am not being very precise, but it is not 1.2MB.
For comparison, if you grab Gay's dtoa.c
(which is effectively the inspiration/source for strtod), you will find that it compiles down to a 55 KB binary.
Note that simpler version of this algorithm is part of Go standard library (as of a few weeks ago) and they did consider binary size as a factor.
from fast_float.
Regarding benchmarking, I do have a pretty decent one there:
https://github.com/lemire/simple_fastfloat_benchmark
It used to support Visual Studio, but over several rounds of reengineering, I broke compatibility with Visual Studio. This could be fixed with some work?
from fast_float.
(And apologies if this is not the place to post such data.)
It is totally fair to assess binary size, but it would be better to do it in a separate issue.
from fast_float.
Now an empty "int main() {}" binary will use 17KB. So fast_float itself cannot be much more than about ~15KB in that case. It may be a bit more, I am not being very precise, but it is not 1.2MB.
Strange - that's exactly what I did. In my results the main is a loop using fgets()
to read from stdin and then calling a macro which consists of the call to fast_float::from_chars()
or is simply empty for the baseline. The Release size of the baseline with the empty loop comes to about 8.5KB in linux and 11KB in windows.
But it is really relevant here that I compiled this with the static standard library, so that may be causing the increased size. I will investigate this further and - if justified - pick this up in a different issue.
from fast_float.
@biojppm There is about 84 KB of code in there, most of it made of comments. The code volume is about the same as dtoa.c. I am not denying that you are seeing a potential issue, but one would still have to explain how ~85KB of code (mostly comments) turn into 1.2MB of binary.
from fast_float.
I am also interested in a to_char
implementations which is super fast. Ideally faster than the Dragonbox algorithm.
from fast_float.
The dragonbox.cc
implementation from abolz/Drachennest has been recommended to me by the author of Dragonbox. It doesn't require C++17; it compiles for me in C++11 mode.
It seems to work, but I've had to modify it for header-only use.
from fast_float.
Related Issues (20)
- Newly introduced failure parsing "0" with Clang and FE_DOWNWARD HOT 3
- `parse_infnan` should return an invalid argument error code when different from `inf` or `infinity` HOT 2
- Add the following corner cases to our tests
- Single header for release 3.9.0 is named fastfloat.h instead of fast_float.h HOT 2
- fast_float for x86 fails on basictest HOT 4
- Some exhaustive tests fail since version 4.0.0 [Note: This is due to the API change with respect to error reports.] HOT 2
- simple_decimal_conversion.h can be deleted HOT 1
- Please bundle Apache 2.0 license in the releases as the license requires HOT 2
- Release fastfloat.h does not include BOOST license HOT 5
- warning: C4459: declaration of 'uint' hides global declaration HOT 2
- incomplete type is not allowed HOT 1
- Make an intermediate release of fast_float HOT 1
- Support for multiple decimal points HOT 1
- parsing uint8_t
- Allow testing withouth supplemental_test_files HOT 1
- Buffer overflow in parse_int_string HOT 1
- warning STL4038: The contents of <stdfloat> are available only with C++23 or later. HOT 1
- `ascii_number.h` is closing `namespace` incorrectly HOT 4
- 6.1.0 release asset is from an older version HOT 3
- Function `write_u64` seems to be unused
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fast_float.