GithubHelp home page GithubHelp logo

fastfloat / fast_float Goto Github PK

View Code? Open in Web Editor NEW
1.3K 36.0 115.0 720 KB

Fast and exact implementation of the C++ from_chars functions for number types: 4x to 10x faster than strtod, part of GCC 12 and WebKit/Safari

License: Apache License 2.0

CMake 2.78% C++ 93.73% Python 3.31% C 0.01% Shell 0.16%
cpp11 cpp17 cpp-library high-performance freebsd linux macos visual-studio neon simd

fast_float's Introduction

fast_float number parsing library: 4x faster than strtod

Fuzzing Status Ubuntu 22.04 CI (GCC 11)

The fast_float library provides fast header-only implementations for the C++ from_chars functions for float and double types as well as integer types. These functions convert ASCII strings representing decimal values (e.g., 1.3e10) into binary types. We provide exact rounding (including round to even). In our experience, these fast_float functions many times faster than comparable number-parsing functions from existing C++ standard libraries.

Specifically, fast_float provides the following two functions to parse floating-point numbers with a C++17-like syntax (the library itself only requires C++11):

from_chars_result from_chars(const char* first, const char* last, float& value, ...);
from_chars_result from_chars(const char* first, const char* last, double& value, ...);

You can also parse integer types:

The return type (from_chars_result) is defined as the struct:

struct from_chars_result {
    const char* ptr;
    std::errc ec;
};

It parses the character sequence [first,last) for a number. It parses floating-point numbers expecting a locale-independent format equivalent to the C++17 from_chars function. The resulting floating-point value is the closest floating-point values (using either float or double), using the "round to even" convention for values that would otherwise fall right in-between two values. That is, we provide exact parsing according to the IEEE standard.

Given a successful parse, the pointer (ptr) in the returned value is set to point right after the parsed number, and the value referenced is set to the parsed value. In case of error, the returned ec contains a representative error, otherwise the default (std::errc()) value is stored.

The implementation does not throw and does not allocate memory (e.g., with new or malloc).

It will parse infinity and nan values.

Example:

#include "fast_float/fast_float.h"
#include <iostream>

int main() {
    const std::string input =  "3.1416 xyz ";
    double result;
    auto answer = fast_float::from_chars(input.data(), input.data()+input.size(), result);
    if(answer.ec != std::errc()) { std::cerr << "parsing failure\n"; return EXIT_FAILURE; }
    std::cout << "parsed the number " << result << std::endl;
    return EXIT_SUCCESS;
}

You can parse delimited numbers:

  const std::string input =   "234532.3426362,7869234.9823,324562.645";
  double result;
  auto answer = fast_float::from_chars(input.data(), input.data()+input.size(), result);
  if(answer.ec != std::errc()) {
    // check error
  }
  // we have result == 234532.3426362.
  if(answer.ptr[0] != ',') {
    // unexpected delimiter
  }
  answer = fast_float::from_chars(answer.ptr + 1, input.data()+input.size(), result);
  if(answer.ec != std::errc()) {
    // check error
  }
  // we have result == 7869234.9823.
  if(answer.ptr[0] != ',') {
    // unexpected delimiter
  }
  answer = fast_float::from_chars(answer.ptr + 1, input.data()+input.size(), result);
  if(answer.ec != std::errc()) {
    // check error
  }
  // we have result == 324562.645.

Like the C++17 standard, the fast_float::from_chars functions take an optional last argument of the type fast_float::chars_format. It is a bitset value: we check whether fmt & fast_float::chars_format::fixed and fmt & fast_float::chars_format::scientific are set to determine whether we allow the fixed point and scientific notation respectively. The default is fast_float::chars_format::general which allows both fixed and scientific.

The library seeks to follow the C++17 (see 20.19.3.(7.1)) specification.

  • The from_chars function does not skip leading white-space characters.
  • A leading + sign is forbidden.
  • It is generally impossible to represent a decimal value exactly as binary floating-point number (float and double types). We seek the nearest value. We round to an even mantissa when we are in-between two binary floating-point numbers.

Furthermore, we have the following restrictions:

  • We only support float and double types at this time.
  • We only support the decimal format: we do not support hexadecimal strings.
  • For values that are either very large or very small (e.g., 1e9999), we represent it using the infinity or negative infinity value and the returned ec is set to std::errc::result_out_of_range.

We support Visual Studio, macOS, Linux, freeBSD. We support big and little endian. We support 32-bit and 64-bit systems.

We assume that the rounding mode is set to nearest (std::fegetround() == FE_TONEAREST).

Integer types

You can also parse integer types using different bases (e.g., 2, 10, 16). The following code will print the number 22250738585072012 three times:

  uint64_t i;
  const char str[] = "22250738585072012";
  auto answer = fast_float::from_chars(str, str + strlen(str), i);
  if (answer.ec != std::errc()) {
    std::cerr << "parsing failure\n";
    return EXIT_FAILURE;
  }
  std::cout << "parsed the number "<< i << std::endl;

  const char binstr[] = "1001111000011001110110111001001010110100111000110001100";

  answer = fast_float::from_chars(binstr, binstr + strlen(binstr), i, 2);
  if (answer.ec != std::errc()) {
    std::cerr << "parsing failure\n";
    return EXIT_FAILURE;
  }
  std::cout << "parsed the number "<< i << std::endl;


  const char hexstr[] = "4f0cedc95a718c";

  answer = fast_float::from_chars(hexstr, hexstr + strlen(hexstr), i, 16);
  if (answer.ec != std::errc()) {
    std::cerr << "parsing failure\n";
    return EXIT_FAILURE;
  }
  std::cout << "parsed the number "<< i << std::endl;

C++20: compile-time evaluation (constexpr)

In C++20, you may use fast_float::from_chars to parse strings at compile-time, as in the following example:

// consteval forces compile-time evaluation of the function in C++20.
consteval double parse(std::string_view input) {
  double result;
  auto answer = fast_float::from_chars(input.data(), input.data()+input.size(), result);
  if(answer.ec != std::errc()) { return -1.0; }
  return result;
}

// This function should compile to a function which
// merely returns 3.1415.
constexpr double constexptest() {
  return parse("3.1415 input");
}

C++23: Fixed width floating-point types

The library also supports fixed-width floating-point types such as std::float32_t and std::float64_t. E.g., you can write:

std::float32_t result;
auto answer = fast_float::from_chars(f.data(), f.data() + f.size(), result);

Non-ASCII Inputs

We also support UTF-16 and UTF-32 inputs, as well as ASCII/UTF-8, as in the following example:

#include "fast_float/fast_float.h"
#include <iostream>

int main() {
    const std::u16string input =  u"3.1416 xyz ";
    double result;
    auto answer = fast_float::from_chars(input.data(), input.data()+input.size(), result);
    if(answer.ec != std::errc()) { std::cerr << "parsing failure\n"; return EXIT_FAILURE; }
    std::cout << "parsed the number " << result << std::endl;
    return EXIT_SUCCESS;
}

Advanced options: using commas as decimal separator, JSON and Fortran

The C++ standard stipulate that from_chars has to be locale-independent. In particular, the decimal separator has to be the period (.). However, some users still want to use the fast_float library with in a locale-dependent manner. Using a separate function called from_chars_advanced, we allow the users to pass a parse_options instance which contains a custom decimal separator (e.g., the comma). You may use it as follows.

#include "fast_float/fast_float.h"
#include <iostream>

int main() {
    const std::string input =  "3,1416 xyz ";
    double result;
    fast_float::parse_options options{fast_float::chars_format::general, ','};
    auto answer = fast_float::from_chars_advanced(input.data(), input.data()+input.size(), result, options);
    if((answer.ec != std::errc()) || ((result != 3.1416))) { std::cerr << "parsing failure\n"; return EXIT_FAILURE; }
    std::cout << "parsed the number " << result << std::endl;
    return EXIT_SUCCESS;
}

You can also parse Fortran-like inputs:

#include "fast_float/fast_float.h"
#include <iostream>

int main() {
    const std::string input =  "1d+4";
    double result;
    fast_float::parse_options options{ fast_float::chars_format::fortran };
    auto answer = fast_float::from_chars_advanced(input.data(), input.data()+input.size(), result, options);
    if((answer.ec != std::errc()) || ((result != 10000))) { std::cerr << "parsing failure\n"; return EXIT_FAILURE; }
    std::cout << "parsed the number " << result << std::endl;
    return EXIT_SUCCESS;
}

You may also enforce the JSON format (RFC 8259):

#include "fast_float/fast_float.h"
#include <iostream>

int main() {
    const std::string input =  "+.1"; // not valid
    double result;
    fast_float::parse_options options{ fast_float::chars_format::json };
    auto answer = fast_float::from_chars_advanced(input.data(), input.data()+input.size(), result, options);
    if(answer.ec == std::errc()) { std::cerr << "should have failed\n"; return EXIT_FAILURE; }
    return EXIT_SUCCESS;
}

By default the JSON format does not allow inf:

#include "fast_float/fast_float.h"
#include <iostream>

int main() {
    const std::string input =  "inf"; // not valid in JSON
    double result;
    fast_float::parse_options options{ fast_float::chars_format::json };
    auto answer = fast_float::from_chars_advanced(input.data(), input.data()+input.size(), result, options);
    if(answer.ec == std::errc()) { std::cerr << "should have failed\n"; return EXIT_FAILURE; }
}

You can allow it with a non-standard json_or_infnan variant:

#include "fast_float/fast_float.h"
#include <iostream>

int main() {
    const std::string input =  "inf"; // not valid in JSON but we allow it with json_or_infnan
    double result;
    fast_float::parse_options options{ fast_float::chars_format::json_or_infnan };
    auto answer = fast_float::from_chars_advanced(input.data(), input.data()+input.size(), result, options);
    if(answer.ec != std::errc() || (!std::isinf(result))) { std::cerr << "should have parsed infinity\n"; return EXIT_FAILURE; }
    return EXIT_SUCCESS;
}

Users and Related Work

The fast_float library is part of:

  • GCC (as of version 12): the from_chars function in GCC relies on fast_float.
  • WebKit, the engine behind Safari (Apple's web browser)
  • DuckDB
  • Apache Arrow where it multiplied the number parsing speed by two or three times
  • Google Jsonnet
  • ClickHouse

The fastfloat algorithm is part of the LLVM standard libraries. There is a derived implementation part of AdaCore.

The fast_float library provides a performance similar to that of the fast_double_parser library but using an updated algorithm reworked from the ground up, and while offering an API more in line with the expectations of C++ programmers. The fast_double_parser library is part of the Microsoft LightGBM machine-learning framework.

References

Other programming languages

How fast is it?

It can parse random floating-point numbers at a speed of 1 GB/s on some systems. We find that it is often twice as fast as the best available competitor, and many times faster than many standard-library implementations.

$ ./build/benchmarks/benchmark
# parsing random integers in the range [0,1)
volume = 2.09808 MB
netlib                                  :   271.18 MB/s (+/- 1.2 %)    12.93 Mfloat/s
doubleconversion                        :   225.35 MB/s (+/- 1.2 %)    10.74 Mfloat/s
strtod                                  :   190.94 MB/s (+/- 1.6 %)     9.10 Mfloat/s
abseil                                  :   430.45 MB/s (+/- 2.2 %)    20.52 Mfloat/s
fastfloat                               :  1042.38 MB/s (+/- 9.9 %)    49.68 Mfloat/s

See https://github.com/lemire/simple_fastfloat_benchmark for our benchmarking code.

Video

Go Systems 2020

Using as a CMake dependency

This library is header-only by design. The CMake file provides the fast_float target which is merely a pointer to the include directory.

If you drop the fast_float repository in your CMake project, you should be able to use it in this manner:

add_subdirectory(fast_float)
target_link_libraries(myprogram PUBLIC fast_float)

Or you may want to retrieve the dependency automatically if you have a sufficiently recent version of CMake (3.11 or better at least):

FetchContent_Declare(
  fast_float
  GIT_REPOSITORY https://github.com/lemire/fast_float.git
  GIT_TAG tags/v1.1.2
  GIT_SHALLOW TRUE)

FetchContent_MakeAvailable(fast_float)
target_link_libraries(myprogram PUBLIC fast_float)

You should change the GIT_TAG line so that you recover the version you wish to use.

Using as single header

The script script/amalgamate.py may be used to generate a single header version of the library if so desired. Just run the script from the root directory of this repository. You can customize the license type and output file if desired as described in the command line help.

You may directly download automatically generated single-header files:

https://github.com/fastfloat/fast_float/releases/download/v6.1.1/fast_float.h

RFC 7159

If you need support for RFC 7159 (JSON standard), you may want to consider using the fast_double_parser library instead.

Credit

Though this work is inspired by many different people, this work benefited especially from exchanges with Michael Eisel, who motivated the original research with his key insights, and with Nigel Tao who provided invaluable feedback. Rémy Oudompheng first implemented a fast path we use in the case of long digits.

The library includes code adapted from Google Wuffs (written by Nigel Tao) which was originally published under the Apache 2.0 license.

License

Licensed under either of Apache License, Version 2.0 or MIT license or BOOST license .
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this repository by you, as defined in the Apache-2.0 license, shall be triple licensed as above, without any additional terms or conditions.

fast_float's People

Contributors

alexhuszagh avatar alugowski avatar barracuda156 avatar biojppm avatar coeur avatar eugenegff avatar filipecosta90 avatar huangqinjin avatar jrahlf avatar jwakely avatar kou avatar lemire avatar leni536 avatar mayawarrier avatar mtahak avatar mumbleskates avatar nealrichardson avatar olivierldff avatar pharago avatar pitrou avatar samuellongchamps avatar stefanbruens avatar striezel avatar therandomguy146275 avatar timkpaine avatar urgau avatar v1gnesh avatar wojdyr avatar xelatihy avatar xvitaly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fast_float's Issues

Link error when using fast_float on Visual Studio 2019 preview 3

It might be a problem with Visual Studio but when I include fast_float in my project I get bunch of link errors related to
fast_float::binary_format being already defined.

Minimal reproduction

test.h
#pragma once
#include "fast_float/fast_float.h"

main.cpp
#include "test.h"
int main() { return 0; }

foo.cpp
#include "test.h"
void foo() { }

feature?: ignored characters `'_ / allowing prefixs 0x and 0b etc...

This isn't a performance related feature, but I've found it odd that the charconv spec goes out of its way to avoid parsing what would otherwise be necessary to handle c++'s own floating point literals. What would you think about having some additional utility from_chars like functions to handle those formats?

Ages ago I modified microsoft's charconv to do this for myself, but it broke at some point because headers changed and some utility functions went missing. The core idea is roughly this:

if (fmt is prefixed) {
  //bump pointer forward past the prefix for the format
}

//continue parsing like charconv would except skip over specified characters
uint64_t mantissa = 0;
for (;begin != end; ++begin) {
if ((*begin == some_template_pack) || ...)
  continue;
const unsigned char digit = char_table[*begin];
if (digit >= base)
  break;
//so on...
}

Not used your library, but it seems like hex formatted floats have the enum for the format but aren't supported? Could also be the opportunity to add that in.

Performance tests on various distributions.

Uniformly distributed floats in [0, 1) don't represent typical datasets.

We can look at some other distributions (example from ClickHouse performance tests):
https://github.com/ClickHouse/ClickHouse/blob/master/tests/performance/float_parsing.xml

Clarifications about this perf test:
rand() is uniformly distributed UInt32;
rand64() is uniformly distributed UInt64;
In rand(1) and rand(2) the arguments are just tags to distinguish function calls, they behave in the same way as rand().

constexpr

I think the entire thing can be made constexpr (even if std::from_chars is not), you only need to replace memcpy with bitcast (needs c++20).
Would be a nice addition.

fastfloat vs fast_float?

Can either of the organization/repo names be changed to the other? The current situation is misleading; I was already bitten by the difference while cloning, and I'm sure many more people will also have this.

optimized (smaller) lookup table for float (binary32 only)

Have you considered optimizing the code size for parsing floats?
The LUT power_of_five_128 has approximately ~1400 entries which are needed for parsing doubles.
I don't know how many entries are required for parsing a float, but I suspect the LUT could be a lot smaller in that case.

If there was a separate LUT for parsing floats, the compiled binary size could be reduced significantly.

Decimal to double

Hi,
I am writing conversion float -> double, with requirement to_string(float) = to_string(double)

I am using Dragonbox to convert float to chars, and fast_float to convert chars to double.

double floatToDouble(float f) {
constexpr int buffer_length = 1 + // for '\0'
jkj::dragonbox::max_output_string_lengthjkj::dragonbox::ieee754_binary64;
char buffer[buffer_length];
char* end_ptr = jkj::dragonbox::to_chars(f, buffer);
double d;
fast_float::from_chars(buffer, end_ptr, d);
return d;
}

Works perfect! Both Dragonbox and fast_float are very fast!!! Big thanks for nice library!

At same time, intermediate chars format sounds excessive.

Dragonbox offer conversion float to decimal (significand, exponent, is_negative).

in from_chars, fast_float convert input chars into parsed_number_string, which has exponent, mantissa, negative.

I tried to write function from_decimal:

template
std::errc from_decimal(int64_t exponent, uint64_t mantissa, bool negative, T& value) noexcept {

static_assert (std::is_same<T, double>::value || std::is_same<T, float>::value, "only float and double are supported");

parsed_number_string pns {.exponent = exponent, .mantissa = mantissa, .negative = negative, .valid = true};

and continue same as from_chars.
It is segfaulting, apparently you also need to fill integer and fraction in parsed_number_string.
Looking at code, it is hard to figure out how correctly set integer and fraction in parsed_number_string, giving input of significand and exponent from Dragonbox.
Would you mind to give me a hint, how to calculate integer and fraction from significand and exponent?

Also... may be you can consider making from_decimal as member of public API? I am just touching this area, but sounds like converting decimal -> double can be useful.

Thank you for reading!

off-by-1 (power of 10) for long strings

#include <stdio.h>
#include <stdlib.h>

#include "/the/path/to/fast_float/fast_float.h"

int main(int argc, char** argv) {
  const char* s =
      "3."
      "141592653589793238462643383279502884197169399375105820974944592307816406"
      "286208998628034825342117067982148086513282306647093844609550582231725359"
      "408128481117450284102701938521105559644622948954930381964428810975665933"
      "446128475648233786783165271201909145648566923460348610454326648213393607"
      "260249141273724587006606315588174881520920962829254091715364367892590360"
      "011330530548820466521384146951941511609433057270365759591953092186117381"
      "932611793105118548074462379962749567351885752724891227938183011949129833"
      "673362440656643086021394946395224737190702179860943702770539217176293176"
      "752384674818467669405132000568127145263560827785771342757789609173637178"
      "721468440901224953430146549585371050792279689258923542019956112129021960"
      "864034418159813629774771309960518707211349999998372978";
  for (int i = 0; i < 16; i++) {
    // Parse all but the last i chars. We should still get 3.141ish.
    double d = 0.0;
    fast_float::from_chars(s, s + strlen(s) - i, d);
    printf("%f\n", d);
  }
  return 0;
}

Output:

0.000000
0.000000
0.000003
0.000031
0.000314
0.003142
0.031416
0.314159
3.141593
3.141593
3.141593
3.141593
3.141593
3.141593
3.141593
3.141593

Build error with clang+mingw

When building rapidyaml (which uses fast_float) with clang+mingw on windows, I get the following error message:

FAILED: CMakeFiles/ryml.dir/src/c4/yml/tree.cpp.obj
D:\User\Adrian\Programmierung\3rdParty\rapidyaml\Clang-Windows\bin\clang++.exe  -I../src -I../ext/c4core/src -O3 -DNDEBUG -std=c++11 -MD -MT CMakeFiles/ryml.dir/src/c4/yml/tree.cpp.obj -MF CMakeFiles\ryml.dir\src\c4\yml\tree.cpp.obj.d -o CMakeFiles/ryml.dir/src/c4/yml/tree.cpp.obj -c ../src/c4/yml/tree.cpp
In file included from ../src/c4/yml/tree.cpp:1:
In file included from ../src\c4/yml/tree.hpp:10:
In file included from ../ext/c4core/src\c4/charconv.hpp:50:
In file included from ../ext/c4core/src\c4/ext/fast_float.hpp:14:
In file included from ../ext/c4core/src\c4/ext/./fast_float/include/fast_float/fast_float.h:44:
In file included from ../ext/c4core/src\c4/ext/./fast_float/include/fast_float/parse_number.h:3:
In file included from ../ext/c4core/src\c4/ext/./fast_float/include/fast_float/ascii_number.h:9:
../ext/c4core/src\c4/ext/./fast_float/include/fast_float/float_common.h:168:16: error: use of undeclared identifier '_umul128'
  answer.low = _umul128(a, b, &answer.high); // _umul128 not available on ARM64
               ^

_umul128() is declared in intrin.h, but it is not included in fast_float/float_common.h due to the following lines:

#if ((defined(_WIN32) || defined(_WIN64)) && !defined(__clang__))
#include <intrin.h>
#endif

So I think the condition should be changed, perhaps to #ifdef __MINGW32__?

Cross-compilation for arm-none-eabi target

Hi, very nice project!

I've found that two changes need to be made in order to cross-compile for a "bare-metal" ARM target using arm-none-eabi-gcc (that is, the gcc arm embedded toolchain).

  1. In float_common.h, there is this block:

#if defined(__APPLE__) || defined(__FreeBSD__)
#include <machine/endian.h>
#elif defined(sun) || defined(__sun)
#include <sys/byteorder.h>
#else
#include <endian.h>
#endif

With every version of the toolchain from 10.3.1 going back to at least 5.4, endian.h is located in machine/endian.h, so line 47 needs to match a macro that for the ARM target. I've experimented and found both || defined (__arm__) and || defined (__ARM_EABI__) work, but I'm not sure which is proper (nor do I know where this file is located with propriety non-gcc-based ARM compilers).

  1. In digit_comparison.h:
    int32_t shift = -am.power2 + 1;
    cb(am, std::min(shift, 64));

The arm-none-eabi-g++ compiler complains no matching function for call to 'min(int32_t&, int)'. Explicitly casting the integer constant to the same type as the variable works, and seems to be stylistically similar to other casts in this file:

cb(am, std::min(shift, int32_t(64)));

If these changes seem helpful to others, I'm happy to issue a pull request. Or try different solutions if this breaks someone else's build. In any case, thanks for the great project!

[feature request] 32-bit compilation

Would you be willing to support 32bit compilation? For ARM targets, this would be important.

FWIW, while trying to compile 32 bit I get this with vs2019:

D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(68,7): error C3861: '_BitScanReverse64': identifier not found [D:\a\c4core\c4core\build\static32\c4core.vcxproj]
D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(84,66): error C2169: 'fast_float::__emulu': intrinsic function, cannot be defined [D:\a\c4core\c4core\build\static32\c4core.vcxproj]
D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(89,17): error C2065: '__emulu': undeclared identifier [D:\a\c4core\c4core\build\static32\c4core.vcxproj]
D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(89): error C7552: 'fast_float::__emulu': purely intrinsic functions have no address [D:\a\c4core\c4core\build\static32\c4core.vcxproj]
D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(89,60): error C2440: 'initializing': cannot convert from 'unsigned __int64 (__cdecl *)(unsigned int,unsigned int)' to 'uint64_t' [D:\a\c4core\c4core\build\static32\c4core.vcxproj]
D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(89,15): message : There is no context in which this conversion is possible [D:\a\c4core\c4core\build\static32\c4core.vcxproj]
D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(90,17): error C2065: '__emulu': undeclared identifier [D:\a\c4core\c4core\build\static32\c4core.vcxproj]
D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(90): error C7552: 'fast_float::__emulu': purely intrinsic functions have no address [D:\a\c4core\c4core\build\static32\c4core.vcxproj]
D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(90,52): error C2440: 'initializing': cannot convert from 'unsigned __int64 (__cdecl *)(unsigned int,unsigned int)' to 'uint64_t' [D:\a\c4core\c4core\build\static32\c4core.vcxproj]
D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(90,15): message : There is no context in which this conversion is possible [D:\a\c4core\c4core\build\static32\c4core.vcxproj]
D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(91,24): error C2065: '__emulu': undeclared identifier [D:\a\c4core\c4core\build\static32\c4core.vcxproj]
D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(91,67): error C2568: '+': unable to resolve function overload [D:\a\c4core\c4core\build\static32\c4core.vcxproj]
D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(91,67): message : could be 'unsigned __int64 fast_float::__emulu(unsigned int,unsigned int)' [D:\a\c4core\c4core\build\static32\c4core.vcxproj]
D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(94,9): error C2065: '__emulu': undeclared identifier [D:\a\c4core\c4core\build\static32\c4core.vcxproj]
D:\a\c4core\c4core\src\c4\ext\fast_float\include\fast_float\float_common.h(94,76): error C2296: '+': illegal, left operand has type 'unsigned __int64 (__cdecl *)(unsigned int,unsigned int)' [D:\a\c4core\c4core\build\static32\c4core.vcxproj]

and on gcc, I get this:

[  3%] Building CXX object CMakeFiles/c4core.dir/src/c4/base64.cpp.o
[  3%] Building CXX object CMakeFiles/c4core.dir/src/c4/char_traits.cpp.o
[  5%] Building CXX object CMakeFiles/c4core.dir/src/c4/error.cpp.o
In file included from /home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/ascii_number.h:9:0,
                 from /home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/parse_number.h:3,
                 from /home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/fast_float.h:44,
                 from /home/runner/work/c4core/c4core/src/c4/charconv.hpp:21,
                 from /home/runner/work/c4core/c4core/src/c4/base64.hpp:9,
                 from /home/runner/work/c4core/c4core/src/c4/base64.cpp:1:
/home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/float_common.h: In function ‘fast_float::value128 fast_float::full_multiplication(uint64_t, uint64_t)’:
/home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/float_common.h:120:3: error: ‘__uint128_t’ was not declared in this scope
   __uint128_t r = ((__uint128_t)value1) * value2;
   ^~~~~~~~~~~
/home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/float_common.h:120:3: note: suggested alternative: ‘__uint32_t’
   __uint128_t r = ((__uint128_t)value1) * value2;
   ^~~~~~~~~~~
   __uint32_t
/home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/float_common.h:121:25: error: ‘r’ was not declared in this scope
   answer.low = uint64_t(r);
                         ^
/home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/float_common.h:117:63: error: unused parameter ‘value1’ [-Werror=unused-parameter]
 fastfloat_really_inline value128 full_multiplication(uint64_t value1,
                                                               ^~~~~~
/home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/float_common.h:118:63: error: unused parameter ‘value2’ [-Werror=unused-parameter]
                                                      uint64_t value2) {
                                                               ^~~~~~
[  7%] Building CXX object CMakeFiles/c4core.dir/src/c4/format.cpp.o
In file included from /home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/parse_number.h:3:0,
                 from /home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/fast_float.h:44,
                 from /home/runner/work/c4core/c4core/src/c4/charconv.hpp:21,
                 from /home/runner/work/c4core/c4core/src/c4/base64.hpp:9,
                 from /home/runner/work/c4core/c4core/src/c4/base64.cpp:1:
/home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/ascii_number.h: In function ‘fast_float::parsed_number_string fast_float::parse_number_string(const char*, const char*, fast_float::chars_format)’:
/home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/ascii_number.h:102:35: error: useless cast to type ‘int32_t {aka int}’ [-Werror=useless-cast]
       int32_t(p - start_digits - 1); // used later to guard against overflows
                                   ^
/home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/ascii_number.h:150:44: error: useless cast to type ‘int’ [-Werror=useless-cast]
     digit_count -= int(start - start_digits);
                                            ^
/home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/ascii_number.h: In function ‘fast_float::decimal fast_float::parse_decimal(const char*, const char*)’:
/home/runner/work/c4core/c4core/src/c4/ext/fast_float/include/fast_float/ascii_number.h:218:58: error: useless cast to type ‘int32_t {aka int}’ [-Werror=useless-cast]
     answer.decimal_point = int32_t(first_after_period - p);
                                                          ^
cc1plus: all warnings being treated as errors

behavior in case of overflow/underflow differs from std::from_chars specification?

When the parsed value is outside the representable range, such as on input "1e-10000" and "1e+10000", it seems fast_float::from_chars sets the 'value' output parameter to 0 and infinity respectively and returns std::errc{}.

But the specification for std::from_chars (http://eel.is/c++draft/charconv.from.chars#1) says:

If the parsed value is not in the range representable by the type of value, value is unmodified and the member ec of the return value is equal to errc​::​result_­out_­of_­range.

It this deviation from the C++ standard intended?

When integrating fast_float into libstdc++, we adjusted this behavior with the following patch: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=40b0d4472a2591cf27f3a81aa3fba57dc4532648

Accept custom decimal point

In some cases it may be desired to customize the decimal point character that's detected by the float parsing routines. Of course, it's easy for the caller to rewrite the input, so this is not strictly required from fast_float (though may be a bit faster if no rewriting is necessary).

Port to Java

We need a port to Java.

I think that @elbosso has offered their help. I am willing to work on it.

A C# port is almost ready to be made public.

string_test is abandoned

OS: ubuntu 21.04
compiler: clang-14 nightly

It seems that some of the extended tests are not run regularly. I cannot get string_test to pass, and it's unclear when if ever this test is run by CI.

Building and running with the built-in cmake:

Test project /home/widders/repos/fast_float/build
    Start 1: example_test
1/4 Test #1: example_test .....................   Passed    0.00 sec
    Start 2: example_comma_test
2/4 Test #2: example_comma_test ...............   Passed    0.00 sec
    Start 3: basictest
3/4 Test #3: basictest ........................   Passed    0.00 sec
    Start 4: string_test
4/4 Test #4: string_test ......................Child aborted***Exception:   0.00 sec
32 bits checks
parsing  1e1000 100000 3.14159265359  -1e-500 001    1e01  1e0000001  -inf
 I could not parse 
terminate called after throwing an instance of 'std::runtime_error'
  what():  bug

Unspecified Behavior in Multi-Digit Optimizations

I've noticed a similar issue in fast-float-rust, where checking the bounds of the array produces undefined behavior (in Rust), and similar behavior (unspecified) exists in fast_float.

Quoting the C++14 standard:

If two pointers p and q of the same type point to different objects that are not members of the same object or elements of the same array or to different functions, or if only one of them is null, the results of p<q, p>q, p<=q, and p>=q are unspecified.

Therefore, the following code is unspecified behavior (according to cppreference, it is undefined behavior, however, I am not a language lawyer, so I am unsure of the true semantics).

if ((p + 8 <= pend) && is_made_of_eight_digits_fast(p)) {
i = i * 100000000 + parse_eight_digits_unrolled(p); // in rare cases, this will overflow, but that's ok
p += 8;
if ((p + 8 <= pend) && is_made_of_eight_digits_fast(p)) {
i = i * 100000000 + parse_eight_digits_unrolled(p); // in rare cases, this will overflow, but that's ok
p += 8;

Specifically, since pend is one-past-the-end of the array, the last valid point of comparison, the compiler could optimize this as always being true, since p + 8 <= pend must always be true, an undesirable outcome.

The compliant solution is as follows:

#include <iterator>
...

  if ((std::distance(p, pend) >= 8) && is_made_of_eight_digits_fast(p)) {
    i = i * 100000000 + parse_eight_digits_unrolled(p); // in rare cases, this will overflow, but that's ok
    p += 8;
    if ((std::distance(p, pend) >= 8) && is_made_of_eight_digits_fast(p)) {
      i = i * 100000000 + parse_eight_digits_unrolled(p); // in rare cases, this will overflow, but that's ok
      p += 8;
    }
  }

Another example in parse_decimal also needs to be patched.

constexpr does not imply const in c++14

OS: ubuntu 21.04
compiler: clang-14 nightly

fast_float::span<T>::len() is marked constexpr but not const, and the compiler emits a fatal warning in the default build.

commands:

mkdir build && cd build
cmake -DFASTFLOAT_TEST=ON ..
cmake --build . --verbose

error:

[ 11%] Building CXX object tests/CMakeFiles/basictest.dir/basictest.cpp.o
cd /home/widders/repos/fast_float/build/tests && /usr/bin/c++ -DSUPPLEMENTAL_TEST_DATA_DIR=\"/home/widders/repos/fast_float/build/_deps/supplemental_test_files-build/data\" -I/home/widders/repos/fast_float/include -I/home/widders/repos/fast_float/build/_deps/doctest-src -O3 -DNDEBUG -Werror -Wall -Wextra -Weffc++ -Wsign-compare -Wshadow -Wwrite-strings -Wpointer-arith -Winit-self -Wconversion -Wsign-conversion -std=gnu++11 -o CMakeFiles/basictest.dir/basictest.cpp.o -c /home/widders/repos/fast_float/tests/basictest.cpp
In file included from /home/widders/repos/fast_float/tests/basictest.cpp:5:
In file included from /home/widders/repos/fast_float/include/fast_float/fast_float.h:77:
In file included from /home/widders/repos/fast_float/include/fast_float/parse_number.h:4:
In file included from /home/widders/repos/fast_float/include/fast_float/ascii_number.h:18:
/home/widders/repos/fast_float/include/fast_float/float_common.h:127:20: error: 'constexpr' non-static member function will not be implicitly 'const' in C++14; add 'const' to avoid a change in behavior [-Werror,-Wconstexpr-not-const]
  constexpr size_t len() noexcept {
                   ^
                         const
1 error generated.

fix:

diff --git a/include/fast_float/float_common.h b/include/fast_float/float_common.h
--- a/include/fast_float/float_common.h	(revision 1b9150913e07bc199ea7bc25fc1609a748bd301c)
+++ b/include/fast_float/float_common.h	(date 1631605731861)
@@ -124,7 +124,7 @@
   span(const T* _ptr, size_t _length) : ptr(_ptr), length(_length) {}
   span() : ptr(nullptr), length(0) {}
 
-  constexpr size_t len() noexcept {
+  constexpr size_t len() const noexcept {
     return length;
   }

use std::from_chars_result as return type of from_chars

Currently the from_chars() function returns fast_float::from_chars_result which is fully identical to std::from_chars_result. Even ec field is of type std::errc (i.e. non-fast_float custom type).

Make from_chars() return std::from_chars_result.
Remove the fast_float::from_chars_result type.

EDIT: the std::from_chars_result is available in gcc implementation of C++17.

Error in readme.

# parsing random integers in the range [0,1)
volume = 2.09808 MB 
netlib                                  :   294.33 MB/s (+/- 2.4 %)    14.03 Mint/s  

The word integers is a mistake, should be replaced to floats.
The name of Mints/s metric is also bogus.

Single header implementation

It would be nice to have a single header implementation for quick inclusion in other projects. If this were to be of interest, I can cobble together a small python script that will do the job fine, by simply concatenating the header files in inclusion order and removing the local includes.

Allow using a system installation of doctest

I am working on packaging fast_float for Fedora Linux. We need to build and run the tests offline.

For the supplementary test files, it is easy enough to add an additional source tarball and extract it in the same location FetchContent_Populate() would have used.

For doctest, we should use the system-wide copy installed in /usr/include. Since it is a header-only library, this “just works” if we can get CMake to refrain from downloading files and searching for build system files. However, we cannot easily fool FetchContent* into doing nothing because it expects to find an extracted source distribution, with CMakeLists.txt and such, not just the header.

I’m hoping this use case can be better supported upstream. I am currently using the following patch to work around the FetchContent* machinery for doctest. If it’s to your liking, I am happy to submit it as a PR.

diff -Naur fast_float-1.1.1-original/tests/CMakeLists.txt fast_float-1.1.1/tests/CMakeLists.txt
--- fast_float-1.1.1-original/tests/CMakeLists.txt	2021-06-07 10:06:03.000000000 -0400
+++ fast_float-1.1.1/tests/CMakeLists.txt	2021-06-19 14:30:33.825177091 -0400
@@ -4,9 +4,13 @@
 
 include(FetchContent)
 
-FetchContent_Declare(doctest
-  GIT_REPOSITORY https://github.com/onqtam/doctest.git
-  GIT_TAG 2.4.6)
+option(SYSTEM_DOCTEST "Use system copy of doctest" OFF)
+
+if (NOT SYSTEM_DOCTEST)
+  FetchContent_Declare(doctest
+    GIT_REPOSITORY https://github.com/onqtam/doctest.git
+    GIT_TAG 2.4.6)
+endif()
 FetchContent_Declare(supplemental_test_files
   GIT_REPOSITORY https://github.com/fastfloat/supplemental_test_files.git
   GIT_TAG origin/main)
@@ -15,11 +19,13 @@
 
 # FetchContent_MakeAvailable() was only introduced in 3.14
 # https://cmake.org/cmake/help/v3.14/release/3.14.html#modules
-# FetchContent_MakeAvailable(doctest)
-FetchContent_GetProperties(doctest)
-if(NOT doctest_POPULATED)
-  FetchContent_Populate(doctest)
-  add_subdirectory(${doctest_SOURCE_DIR} ${doctest_BINARY_DIR})
+if (NOT SYSTEM_DOCTEST)
+  # FetchContent_MakeAvailable(doctest)
+  FetchContent_GetProperties(doctest)
+  if(NOT doctest_POPULATED)
+    FetchContent_Populate(doctest)
+    add_subdirectory(${doctest_SOURCE_DIR} ${doctest_BINARY_DIR})
+  endif()
 endif()
 FetchContent_GetProperties(supplemental_test_files)
 if(NOT supplemental_test_files_POPULATED)
@@ -40,7 +46,10 @@
       target_compile_options(${TEST_NAME} PUBLIC -Werror -Wall -Wextra -Weffc++)
       target_compile_options(${TEST_NAME} PUBLIC -Wsign-compare -Wshadow -Wwrite-strings -Wpointer-arith -Winit-self -Wconversion -Wsign-conversion)
     endif()
-    target_link_libraries(${TEST_NAME} PUBLIC fast_float doctest supplemental-data)
+    target_link_libraries(${TEST_NAME} PUBLIC fast_float supplemental-data)
+    if (NOT SYSTEM_DOCTEST)
+      target_link_libraries(${TEST_NAME} PUBLIC doctest)
+    endif()
 endfunction(fast_float_add_cpp_test)
 
 

MinGW32 build failure

When trying to integrate this library in Apache Arrow, we found it failed on our MinGW32 CI builds:

D:/a/arrow/arrow/cpp/src/arrow/vendored/fast_float/float_common.h: In function 'arrow_vendored::fast_float::value128 arrow_vendored::fast_float::full_multiplication(uint64_t, uint64_t)':
D:/a/arrow/arrow/cpp/src/arrow/vendored/fast_float/float_common.h:115:3: error: '__uint128_t' was not declared in this scope; did you mean 'uint32_t'?
  115 |   __uint128_t r = ((__uint128_t)value1) * value2;
      |   ^~~~~~~~~~~
      |   uint32_t

The following changes fixed it:
https://gist.github.com/pitrou/a77ab72cfda0b81bc95e9c81489ccd37
(from this changeset: apache/arrow@077869d)

Deviation from strtod

The readme states

It parses floating-point numbers expecting a locale-independent format equivalent to what is used by std::strtod in the default ("C") locale.

I noticed some difference between strtod and fast_float:

  • fast_float does not ignore leading whitespace as defined for strtod.
  • fast_float does not accept a leading 0x or +
  • For 1e999, strtod sets errno = ERANGE. fast_float returns inf and std::errc{} (i.e., no error). The other two libraries that I checked (absl::from_chars and boost::lexical_cast) also return an error.

Maybe the documentation could be extended to reflect the differences between strtod and fast_float?

Big Endian support

I realized that the function from_chars with the exponential notion (e.g. 1e30) with float always returns 0.

This is because this line always copies the first four bytes of uint64_t when we handle float. However, on big-endian platform, the value is stored in the last four bytes of word when we handle float.

This issue comes from apache/arrow#8674

To String

Perhaps fast string conversion also can be added.

The processing of long numbers could be a bit faster

Following https://github.com/lemire/fast_float/pull/15, we have honest performance in the more-than-19-digits scenario, but anyone reading the code paths will see that there is obvious (non highly technical) room from speed gains.

This is not a priority because beating speed records in that unusual scenario is not very important, and we are still typically going to beat standard libraries. We just want to avoid really bad performance.

The parse_decimal is subject to optimizations:

  1. We do not need to reparse the exponent, we could recover it from our pass in parse_number_string.
  2. We could stop after 19 digits and try to bail out using the fast slow path from https://github.com/lemire/fast_float/pull/15

If someone is looking for some fun work...

feature request: to_chars() alternative?

Thanks for your work -- it ticks all the boxes! C++11, non terminated strings, and zero allocations - just what I was looking for in my library to address a really nasty issue caused by trying to stick to standard facilities while avoiding the performance-killing allocation cookie monsters from the STL.

But I am also looking for a matching to_chars() version/alternative that writes into a given buffer+size. I do not care about the roundtrip guarantee dictated by the standard.

Are you considering adding such a thing? Or are you aware of any implementation providing this function with similar quality and design choices?

I looked at ryu which does not tick all the boxes and has a large lookup table, but could work. I've also found fp which seems better but is C++17 and maybe a bit too fresh.

sign conversion warnings; fails to compile with -Werror

In file included from /mnt/d/proj/extern/fast_float/include/fast_float/ascii_number.h:9,
                 from /mnt/d/proj/extern/fast_float/include/fast_float/parse_number.h:3,
                 from /mnt/d/proj/extern/fast_float/include/fast_float/fast_float.h:44,
                 from /mnt/d/proj/c4core/bm/float/read.cpp:83:
/mnt/d/proj/extern/fast_float/include/fast_float/float_common.h: In member function ‘int32_t fast_float::decimal::to_truncated_exponent()’:
/mnt/d/proj/extern/fast_float/include/fast_float/float_common.h:178:12: error: conversion to ‘unsigned int’ from ‘int32_t’ {aka ‘int’} may change the sign of the result [-Werror=sign-conversion]
  178 |     return decimal_point - max_digit_without_overflow;
      |            ^~~~~~~~~~~~~
/mnt/d/proj/extern/fast_float/include/fast_float/float_common.h:178:26: error: conversion to ‘int32_t’ {aka ‘int’} from ‘unsigned int’ may change the sign of the result [-Werror=sign-conversion]
  178 |     return decimal_point - max_digit_without_overflow;
      |            ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/d/proj/extern/fast_float/include/fast_float/parse_number.h:3,
                 from /mnt/d/proj/extern/fast_float/include/fast_float/fast_float.h:44,
                 from /mnt/d/proj/c4core/bm/float/read.cpp:94:
/mnt/d/proj/extern/fast_float/include/fast_float/ascii_number.h: In function ‘fast_float::parsed_number_string fast_float::parse_number_string(const char*, const char*, fast_float::chars_format)’:
/mnt/d/proj/extern/fast_float/include/fast_float/ascii_number.h:88:13: error: conversion to ‘uint64_t’ {aka ‘long unsigned int’} from ‘int’ may change the sign of the result [-Werror=sign-conversion]
   88 |         (*p - '0'); // might overflow, we will handle the overflow later
      |         ~~~~^~~~~~
/mnt/d/proj/extern/fast_float/include/fast_float/ascii_number.h: In function ‘fast_float::decimal fast_float::parse_decimal(const char*, const char*)’:
/mnt/d/proj/extern/fast_float/include/fast_float/ascii_number.h:248:34: error: conversion to ‘uint32_t’ {aka ‘unsigned int’} from ‘int32_t’ {aka ‘int’} may change the sign of the result [-Werror=sign-conversion]
  248 |   answer.decimal_point += answer.num_digits;
      |                                  ^~~~~~~~~~
/mnt/d/proj/extern/fast_float/include/fast_float/ascii_number.h:248:24: error: conversion to ‘int32_t’ {aka ‘int’} from ‘uint32_t’ {aka ‘unsigned int’} may change the sign of the result [-Werror=sign-conversion]
  248 |   answer.decimal_point += answer.num_digits;
      |   ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
In file included from /mnt/d/proj/extern/fast_float/include/fast_float/parse_number.h:3,
                 from /mnt/d/proj/extern/fast_float/include/fast_float/fast_float.h:44,
                 from /mnt/d/proj/c4core/bm/float/read.cpp:83:
/mnt/d/proj/extern/fast_float/include/fast_float/ascii_number.h: In function ‘fast_float::parsed_number_string fast_float::parse_number_string(const char*, const char*, fast_float::chars_format)’:
/mnt/d/proj/extern/fast_float/include/fast_float/ascii_number.h:88:13: error: conversion to ‘uint64_t’ {aka ‘long unsigned int’} from ‘int’ may change the sign of the result [-Werror=sign-conversion]
   88 |         (*p - '0'); // might overflow, we will handle the overflow later
      |         ~~~~^~~~~~
/mnt/d/proj/extern/fast_float/include/fast_float/ascii_number.h: In function ‘fast_float::decimal fast_float::parse_decimal(const char*, const char*)’:
/mnt/d/proj/extern/fast_float/include/fast_float/ascii_number.h:248:34: error: conversion to ‘uint32_t’ {aka ‘unsigned int’} from ‘int32_t’ {aka ‘int’} may change the sign of the result [-Werror=sign-conversion]
  248 |   answer.decimal_point += answer.num_digits;
      |                                  ^~~~~~~~~~
/mnt/d/proj/extern/fast_float/include/fast_float/ascii_number.h:248:24: error: conversion to ‘int32_t’ {aka ‘int’} from ‘uint32_t’ {aka ‘unsigned int’} may change the sign of the result [-Werror=sign-conversion]
  248 |   answer.decimal_point += answer.num_digits;
      |   ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
In file included from /mnt/d/proj/extern/fast_float/include/fast_float/parse_number.h:4,
                 from /mnt/d/proj/extern/fast_float/include/fast_float/fast_float.h:44,
                 from /mnt/d/proj/c4core/bm/float/read.cpp:94:
/mnt/d/proj/extern/fast_float/include/fast_float/decimal_to_binary.h: In function ‘unsigned int fast_float::{anonymous}::power(int)’:
/mnt/d/proj/extern/fast_float/include/fast_float/decimal_to_binary.h:55:43: error: conversion to ‘unsigned int’ from ‘int’ may change the sign of the result [-Werror=sign-conversion]
   55 |     return (((152170 + 65536) * q) >> 16) + 63;
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
In file included from /mnt/d/proj/extern/fast_float/include/fast_float/parse_number.h:4,
                 from /mnt/d/proj/extern/fast_float/include/fast_float/fast_float.h:44,
                 from /mnt/d/proj/c4core/bm/float/read.cpp:83:
/mnt/d/proj/extern/fast_float/include/fast_float/decimal_to_binary.h: In function ‘unsigned int fast_float::{anonymous}::power(int)’:
/mnt/d/proj/extern/fast_float/include/fast_float/decimal_to_binary.h:55:43: error: conversion to ‘unsigned int’ from ‘int’ may change the sign of the result [-Werror=sign-conversion]
   55 |     return (((152170 + 65536) * q) >> 16) + 63;
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
In file included from /mnt/d/proj/extern/fast_float/include/fast_float/fast_float.h:44,
                 from /mnt/d/proj/c4core/bm/float/read.cpp:94:
/mnt/d/proj/extern/fast_float/include/fast_float/parse_number.h: In instantiation of ‘fast_float::from_chars_result fast_float::from_chars(const char*, const char*, T&, fast_float::chars_format) [with T = double]’:
/mnt/d/proj/c4core/bm/float/read.cpp:99:50:   required from here
/mnt/d/proj/extern/fast_float/include/fast_float/parse_number.h:79:49: error: conversion to ‘uint8_t’ {aka ‘unsigned char’} from ‘char’ may change the sign of the result [-Werror=sign-conversion]
   79 |   while ((first != last) && fast_float::is_space(*first)) {
      |                             ~~~~~~~~~~~~~~~~~~~~^~~~~~~~
In file included from /mnt/d/proj/extern/fast_float/include/fast_float/fast_float.h:44,
                 from /mnt/d/proj/c4core/bm/float/read.cpp:83:
/mnt/d/proj/extern/fast_float/include/fast_float/parse_number.h: In instantiation of ‘fast_float::from_chars_result fast_float::from_chars(const char*, const char*, T&, fast_float::chars_format) [with T = float]’:
/mnt/d/proj/c4core/bm/float/read.cpp:88:50:   required from here
/mnt/d/proj/extern/fast_float/include/fast_float/parse_number.h:79:49: error: conversion to ‘uint8_t’ {aka ‘unsigned char’} from ‘char’ may change the sign of the result [-Werror=sign-conversion]
   79 |   while ((first != last) && fast_float::is_space(*first)) {
      |                             ~~~~~~~~~~~~~~~~~~~~^~~~~~~~
In file included from /mnt/d/proj/extern/fast_float/include/fast_float/parse_number.h:4,
                 from /mnt/d/proj/extern/fast_float/include/fast_float/fast_float.h:44,
                 from /mnt/d/proj/c4core/bm/float/read.cpp:94:
/mnt/d/proj/extern/fast_float/include/fast_float/decimal_to_binary.h: In instantiation of ‘fast_float::adjusted_mantissa fast_float::compute_float(int64_t, uint64_t) [with binary = fast_float::binary_format<double>; int64_t = long int; uint64_t = long unsigned int]’:
/mnt/d/proj/extern/fast_float/include/fast_float/parse_number.h:101:131:   required from ‘fast_float::from_chars_result fast_float::from_chars(const char*, const char*, T&, fast_float::chars_format) [with T = double]’
/mnt/d/proj/c4core/bm/float/read.cpp:99:50:   required from here
/mnt/d/proj/extern/fast_float/include/fast_float/decimal_to_binary.h:105:35: error: conversion to ‘unsigned int’ from ‘int’ may change the sign of the result [-Werror=sign-conversion]
  105 |   answer.power2 = power(int(q)) - lz - binary::minimum_exponent() + 1;
      |                                   ^~
/mnt/d/proj/extern/fast_float/include/fast_float/decimal_to_binary.h:105:64: error: unsigned conversion from ‘int’ to ‘unsigned int’ changes value from ‘-1023’ to ‘4294966273’ [-Werror=sign-conversion]
  105 |   answer.power2 = power(int(q)) - lz - binary::minimum_exponent() + 1;
      |                                        ~~~~~~~~~~~~~~~~~~~~~~~~^~
/mnt/d/proj/extern/fast_float/include/fast_float/decimal_to_binary.h:105:67: error: conversion to ‘int’ from ‘unsigned int’ may change the sign of the result [-Werror=sign-conversion]
  105 |   answer.power2 = power(int(q)) - lz - binary::minimum_exponent() + 1;
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
In file included from /mnt/d/proj/extern/fast_float/include/fast_float/parse_number.h:4,
                 from /mnt/d/proj/extern/fast_float/include/fast_float/fast_float.h:44,
                 from /mnt/d/proj/c4core/bm/float/read.cpp:83:
/mnt/d/proj/extern/fast_float/include/fast_float/decimal_to_binary.h: In instantiation of ‘fast_float::adjusted_mantissa fast_float::compute_float(int64_t, uint64_t) [with binary = fast_float::binary_format<float>; int64_t = long int; uint64_t = long unsigned int]’:
/mnt/d/proj/extern/fast_float/include/fast_float/parse_number.h:101:131:   required from ‘fast_float::from_chars_result fast_float::from_chars(const char*, const char*, T&, fast_float::chars_format) [with T = float]’
/mnt/d/proj/c4core/bm/float/read.cpp:88:50:   required from here
/mnt/d/proj/extern/fast_float/include/fast_float/decimal_to_binary.h:105:35: error: conversion to ‘unsigned int’ from ‘int’ may change the sign of the result [-Werror=sign-conversion]
  105 |   answer.power2 = power(int(q)) - lz - binary::minimum_exponent() + 1;
      |                                   ^~
/mnt/d/proj/extern/fast_float/include/fast_float/decimal_to_binary.h:141:23: error: unsigned conversion from ‘int’ to ‘uint64_t’ {aka ‘long unsigned int’} changes value from ‘-2’ to ‘18446744073709551614’ [-Werror=sign-conversion]
  141 |       answer.mantissa &= ~1;          // flip it so that we do not round up
      |       ~~~~~~~~~~~~~~~~^~~~~
/mnt/d/proj/extern/fast_float/include/fast_float/decimal_to_binary.h:105:64: error: unsigned conversion from ‘int’ to ‘unsigned int’ changes value from ‘-127’ to ‘4294967169’ [-Werror=sign-conversion]
  105 |   answer.power2 = power(int(q)) - lz - binary::minimum_exponent() + 1;
      |                                        ~~~~~~~~~~~~~~~~~~~~~~~~^~
/mnt/d/proj/extern/fast_float/include/fast_float/decimal_to_binary.h:105:67: error: conversion to ‘int’ from ‘unsigned int’ may change the sign of the result [-Werror=sign-conversion]
  105 |   answer.power2 = power(int(q)) - lz - binary::minimum_exponent() + 1;
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
/mnt/d/proj/extern/fast_float/include/fast_float/decimal_to_binary.h:141:23: error: unsigned conversion from ‘int’ to ‘uint64_t’ {aka ‘long unsigned int’} changes value from ‘-2’ to ‘18446744073709551614’ [-Werror=sign-conversion]
  141 |       answer.mantissa &= ~1;          // flip it so that we do not round up
      |       ~~~~~~~~~~~~~~~~^~~~~
cc1plus: all warnings being treated as errors

Feature request: fast_fixed_point

Hi in many applications I have I cannot use double / float but use a fixed-point integer representation, for instance storing the number in a int64_t with the stored value is 1 billion times the original value allowing for 9 digits of decimal.
Example:
int64_t val;
fast_fixed_point("42.123456789", val, 9)
then val would be set to 42123456789

I think it should be pretty easy to adapt the current code to support this use case.

Non-compliance to licenses

When reviewing the licensing terms for the usage of your (awesome) library at our organization, we found out that there were some issues that would need to be fixed to assure compliance to the licensing terms in term of format.

MIT license

When comparing the text of the LICENSE-MIT file to the text hosted on https://opensource.org/licenses/MIT, an important line is missing: the copyright with year and copyright holder.

Apache 2 license

The text has been modified to replace the copyright notice and change the [] to {} at line 181/182 for some reason. The text of the Apache 2 license should remain untouched, as these instructions pertain to the addition of the Apache 2 license to any work, and is not the copyright notice itself.

That adapted notice:

   Copyright 2020 The fast_float authors 

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

should be added to the generated header as instructed to make it compliant.

Dual licensing

The notion that the file is dual-licensed under the MIT terms and Apache 2 should be mentioned in the header file as well to avoid confusion. The MIT license is currently raw without mention of it being the MIT license, and is missing the copyright line as well.

unneeded includes

  • ascii_number.h, decimal_to_binary.h include <cstdio> but they don't use any IO functions
  • parse_number.h includes <cassert> but does not use assert (static_assert does not count)
  • ascii_number includes <iterator> which seems unused as well (edit: iterator is actually used)

Warns maybe-uninitialized on g++-6

A quick one:

$ cmake -DCMAKE_CXX_COMPILER=g++-6 ..

causes this:

In file included from /opt/jpmag/proj/extern/fast_float/include/fast_float/ascii_number.h:9:0,
                 from /opt/jpmag/proj/extern/fast_float/include/fast_float/parse_number.h:3,
                 from /opt/jpmag/proj/extern/fast_float/include/fast_float/fast_float.h:44,
                 from /opt/jpmag/proj/extern/fast_float/tests/random_string.cpp:1:
/opt/jpmag/proj/extern/fast_float/include/fast_float/float_common.h: In function ‘fast_float::adjusted_mantissa fast_float::parse_long_mantissa(const char*, const char*) [with binary = fast_float::binary_format<double>]’:
/opt/jpmag/proj/extern/fast_float/include/fast_float/float_common.h:134:35: error: ‘answer.fast_float::adjusted_mantissa::mantissa’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
     return mantissa == o.mantissa && power2 == o.power2;
            ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
In file included from /opt/jpmag/proj/extern/fast_float/include/fast_float/parse_number.h:4:0,
                 from /opt/jpmag/proj/extern/fast_float/include/fast_float/fast_float.h:44,
                 from /opt/jpmag/proj/extern/fast_float/tests/random_string.cpp:1:
/opt/jpmag/proj/extern/fast_float/include/fast_float/decimal_to_binary.h:73:21: note: ‘answer.fast_float::adjusted_mantissa::mantissa’ was declared here
   adjusted_mantissa answer;
                     ^~~~~~
make[1]: *** [CMakeFiles/Makefile2:216: tests/CMakeFiles/long_exhaustive32_64.dir/all] Error 2

fast_int

It would be nice to have a fast string to integer/unsigned parser with similar performance. I suspect it is just a matter of wrapping internal functions with a from_chars-like API. But, without knowing the code well, it is as easy to cook together.

Add Fallback for `FASTFLOAT_XXBIT`

The main motivation for this is RISCV, Sparc, and MIPS architectures, which are still in active use, as well as esoteric, proprietary architectures where we cannot possibly enumerate all compiler architecture defines.

Issue

Currently, the size of the architecture is determine via:

#if (defined(__x86_64) || defined(__x86_64__) || defined(_M_X64)   \
       || defined(__amd64) || defined(__aarch64__) || defined(_M_ARM64) \
       || defined(__MINGW64__)                                          \
       || defined(__s390x__)                                            \
       || (defined(__ppc64__) || defined(__PPC64__) || defined(__ppc64le__) || defined(__PPC64LE__)) \
       || defined(__EMSCRIPTEN__))
#define FASTFLOAT_64BIT
#elif (defined(__i386) || defined(__i386__) || defined(_M_IX86)   \
     || defined(__arm__)                                        \
     || defined(__MINGW32__))
#define FASTFLOAT_32BIT
#else
#error Unknown platform (not 32-bit, not 64-bit?)
#endif

This unfortunately hard-codes supported architectures, which means that architectures like the MIPS family, 32-bit PowerPC, and other architectures are not supported (such as RISCV). Other than adding hard-coded support for other architectures, there is a suitable alternative for most cases (there is no way to determine the exact, preferred register size, but size_t is generally a good barometer).

Solution

We can probe SIZE_MAX to attempt to determine the system architecture, which is guaranteed to be defined on all C++11 systems (unlike UINTPTR_MAX or INTPTR_MAX, which may or may not be defined).

A simple check as follows fixes the issue:

  // Need to check incrementally, since SIZE_MAX is a size_t, avoid overflow.
  // We can never tell the register width, but the SIZE_MAX is a good approximation.
  // UINTPTR_MAX and INTPTR_MAX are optional, so avoid them for max portability.
  #if SIZE_MAX == 0xffff
    #error Unknown platform (16-bit, unsupported)
  #elif SIZE_MAX == 0xffffffff
    #define FASTFLOAT_32BIT
  #elif SIZE_MAX == 0xffffffffffffffff
    #define FASTFLOAT_64BIT
  #else
    #error Unknown platform (not 32-bit, not 64-bit?)
  #endif

We need to check in order, from 16-bit to 64-bit, since SIZE_MAX is of type size_t, and therefore on a 16-bit system we'll get an overflow (which should never happen, since who uses 16-bit systems anymore?).

Consider Optimizing Decimal Operations

Feature Request

Although the code for parsing decimals is quite nice already, a fair amount of work I've done on lexical (my Rust float parser) shows that when the slow path is invoked, we can get quite faster performance in a few cases, and a minimal version of this can prove quite easy, readable, and correct.

This is due to the use of big-integer arithmetic and a few algorithms optimized for big integer math for parsing decimal strings, which minimizes the number of operations relative to the decimal class in fast_float. There is more code involved, however, this code relies on fewer magic numbers and instead uses simple, big-integer algorithms instead. The only additional static storage required is ~32 64-bit integers, which is considerably smaller than the decimal implementation.

This performance gap is quite noticeable with floats like "8.988465674311580536566680e307", where the performance differences goes from ~500ns/iter (lexical) to ~14.465us/iter (fast_float). For denormal floats or those with negative exponents like "8.442911973260991817129021e-309", the performance is slightly better, from ~58.073 ns/iter (lexical) ~69.264ns/iter (fast_float). This scales very well also with input size: scaling well to 768 digits (767, the max for a double-precision float, with 1 extra) and beyond.

The actual implementation is quite simple, in fact, it uses a fairly minimal subset of big-integer arithmetic, which can be implemented in <700 lines of code, and then the parsing algorithms are quite easy to understand and implement. If this is of interest, I'd be happy to submit a PR.

Positive Exponent Implementation

Here is a Rust version of the code required to implement this, obviously, this would be translated to C++:

/// Generate the significant digits with a positive exponent relative to mantissa.
pub fn positive_digit_comp<F: RawFloat>(
    mut bigmant: Bigint,
    exponent: i32,
) -> ExtendedFloat80 {
    // Simple, we just need to multiply by the power of the radix.
    // Now, we can calculate the mantissa and the exponent from this.
    // The binary exponent is the binary exponent for the mantissa
    // shifted to the hidden bit.
    bigmant.pow(10, exponent as u32);

    // Get the exact representation of the float from the big integer.
    // Himant checks **all** the remaining bits after the mantissa,
    // so it will check if **any** truncated digits exist.
    let (mant, is_truncated) = bigmant.hi64();
    let exp = bigmant.bit_length() as i32 - 64 + F::EXPONENT_BIAS;
    let mut fp = ExtendedFloat80 {
        mant,
        exp,
    };

    // Shift the digits into position and determine if we need to round-up.
    shared::round::<F, _>(&mut fp, |f, s| {
        shared::round_nearest_tie_even(f, s, |is_odd, is_halfway, is_above| {
            is_above || (is_halfway && is_truncated) || (is_odd && is_halfway)
        });
    });
    fp
}

Here, the exponent is relative to the significant digits, and where ExtendedFloat80 is just AdjustedMantissa or an 80-bit extended-precision float with a biased exponent. Our bigmant is just the significant digits parsed as a big integer. hi64 is a function that just gets the high 64-bits from big integer (for the significant digits), and checks if any bits below are truncated (for rounding), and EXPONENT_BIAS is 1075 for a 64-bit float.

This implementation is quite simple: first we get the max power of the radix (in this case, 10) that can be stored in a limb, or 10^N <= 2^BITS - 1. For 32-bit limbs, this is 9, for 64-bit limbs, this is 19. Next, we get the max, native integer for this power (or 10^9 for 32-bit limbs, 10^19 for 64-bit limbs). Then, we parse the maximum number of digits we can into a native limb, then add the limb to the big integer, or effectively the following logic:

We then parse using the following logic:

let mut counter: usize = 0;
let mut count: usize = 0;
let mut value: Limb = 0;
let step = ...;
let mut result = Bigint::new();

// Check if we've reached our max native value.
for &digit in digits {
    // Add our temporary values.
    value *= 10;
    value += digit - 0x30;

    counter += 1;
    count += 1;
    if counter == step {
        result.mul_small(max_native);
        result.add_small(value);
        counter = 0;
        value = 0;
    }

    // Check if we've exhausted our max digits.
    if count == max_digits {
        // Add temporary...
        ...
    }
}

In total, we need operations for the following:

  1. Scalar add and mul, with the ability to detect overflow or the carry.
  2. Addition and multiplication of a scalar to a big integer.
  3. Grade-school multiplication of two big integers (asymptotically faster algorithms aren't applicable, since the inputs are too small).
  4. SHL operating for the big integer, both shifting bits and limbs.
  5. An efficient big-integer power algorithm.

The entirety of the big integer algorithms is <700 lines of code, although my version is extensively documented. The only one that here that's remotely interesting is the power algorithm, which uses a single pre-computed large power-of-5, and a a small number of pre-computed small powers. This is effectively just:

pub const SMALL_INT_POW5: [u64; 28] = [
    1,
    5,
    25,
    125,
    625,
    3125,
    15625,
    78125,
    390625,
    1953125,
    9765625,
    48828125,
    244140625,
    1220703125,
    6103515625,
    30517578125,
    152587890625,
    762939453125,
    3814697265625,
    19073486328125,
    95367431640625,
    476837158203125,
    2384185791015625,
    11920928955078125,
    59604644775390625,
    298023223876953125,
    1490116119384765625,
    7450580596923828125,
];

/// Pre-computed large power-of-5 for 32-bit limbs.
#[cfg(not(all(target_pointer_width = "64", not(target_arch = "sparc"))))]
pub const LARGE_POW5: [u32; 10] = [
    4279965485, 329373468, 4020270615, 2137533757, 4287402176, 1057042919, 1071430142, 2440757623,
    381945767, 46164893,
];

/// Pre-computed large power-of-5 for 64-bit limbs.
#[cfg(all(target_pointer_width = "64", not(target_arch = "sparc")))]
pub const LARGE_POW5: [u64; 5] = [
    1414648277510068013,
    9180637584431281687,
    4539964771860779200,
    10482974169319127550,
    198276706040285095,
];

/// Step for large power-of-5 for 32-bit limbs.
pub const LARGE_POW5_STEP: u32 = 135;

pub fn pow(x: &mut Vec, base: u32, mut exp: u32) {
    let large = &LARGE_POW5;
    let step = LARGE_POW5_STEP;
    while exp >= step {
        large_mul(x, large);
        exp -= step;
    }

    // Now use our pre-computed small powers iteratively.
    let small_step = if LIMB_BITS == 32 {
        13
    } else {
        27
    };
    let max_native = (base as Limb).pow(small_step);
    while exp >= small_step {
        small_mul(x, max_native);
        exp -= small_step;
    }
    if exp != 0 {
        let small_power = SMALL_INT_POW5[exp as usize];
        small_mul(x, small_power as Limb);
    }
}

Which is effectively self-explanatory: use large powers of 5 and the large, grade-school multiplication algorithm to minimize the number of multiplications, and then use a small-precomputed table for the remainder. The real power implementation splits this into a power-of-5 multiplication and a left-shift (for the power-of-2).

Negative Exponent Implementation

The negative exponent implementation is likewise simple, but a little different: we need our big mantissa and exponent from before, but we also needed an extended-float prior to rounding. In order to make this work with the existing Lemire algorithm, I simply add i16::MIN (-2^15) rather than set the exponent to -1, so the real extended-float can be passed over. This requires only trivial modifications to existing code, which has no impact on performance for faster algorithms.

First, we create a big integer representing b+h, so we can determine if we round to b+u or to b. This is very simple:

/// Calculate `b` from a a representation of `b` as a float.
#[inline]
pub fn b<F: RawFloat>(float: F) -> ExtendedFloat80 {
    ExtendedFloat80 {
        mant: float.mantissa().as_u64(),
        exp: float.exponent(),
    }
}

/// Calculate `b+h` from a a representation of `b` as a float.
#[inline]
pub fn bh<F: RawFloat>(float: F) -> ExtendedFloat80 {
    let fp = b(float);
    ExtendedFloat80 {
        mant: (fp.mant << 1) + 1,
        exp: fp.exp - 1,
    }
}

Next, we then calculate both the numerator and denominator of a ratio representing the float:

/// Generate the significant digits with a negative exponent relative to mantissa.
pub fn negative_digit_comp<F: RawFloat>(
    bigmant: Bigint,
    mut fp: ExtendedFloat80,
    exponent: i32,
) -> ExtendedFloat80 {
    // Ensure our preconditions are valid:
    //  1. The significant digits are not shifted into place.
    debug_assert!(fp.mant & (1 << 63) != 0);

    // Get the significant digits and radix exponent for the real digits.
    let mut real_digits = bigmant;
    let real_exp = exponent;
    debug_assert!(real_exp < 0);

    // Round down our extended-precision float and calculate `b`.
    let mut b = fp;
    shared::round::<F, _>(&mut b, shared::round_down);
    let b = extended_to_float::<F>(b);

    // Get the significant digits and the binary exponent for `b+h`.
    let theor = bh(b);
    let mut theor_digits = Bigint::from_u64(theor.mant);
    let theor_exp = theor.exp;

    // We need to scale the real digits and `b+h` digits to be the same
    // order. We currently have `real_exp`, in `radix`, that needs to be
    // shifted to `theor_digits` (since it is negative), and `theor_exp`
    // to either `theor_digits` or `real_digits` as a power of 2 (since it
    // may be positive or negative). Try to remove as many powers of 2
    // as possible. All values are relative to `theor_digits`, that is,
    // reflect the power you need to multiply `theor_digits` by.
    let binary_exp = theor_exp - real_exp;
    let halfradix_exp = -real_exp;

    if halfradix_exp != 0 {
        theor_digits.pow(5, halfradix_exp as u32);
    }
    if binary_exp > 0 {
        theor_digits.pow(2, binary_exp as u32);
    } else if binary_exp < 0 {
        real_digits.pow(2, (-binary_exp) as u32);
    }
    ...
}

Finally, we compare these two approximations, and determine if we round-up or down:

pub fn negative_digit_comp<F: RawFloat>(
    bigmant: Bigint,
    mut fp: ExtendedFloat80,
    exponent: i32,
) -> ExtendedFloat80 {
    ...

    // Compare our theoretical and real digits and round nearest, tie even.
    let ord = real_digits.data.cmp(&theor_digits.data);
    shared::round::<F, _>(&mut fp, |f, s| {
        shared::round_nearest_tie_even(f, s, |is_odd, _, _| {
            // Can ignore `is_halfway` and `is_above`, since those were
            // calculates using less significant digits.
            match ord {
                cmp::Ordering::Greater => true,
                cmp::Ordering::Less => false,
                cmp::Ordering::Equal if is_odd => true,
                cmp::Ordering::Equal => false,
            }
        });
    });
    fp
}

Specifics

First, we use 64-bit limbs platforms with native 128-bit multiplication is supported or 64-bit multiplication where you can extract both the high and the low bits efficiently. In practice, this means practically every 64-bit architecture except for SPARCv8 and SPARCv9 uses 64-bit limbs. This is detected by the code compiled with gcc main.c -c -S -O3 -masm=intel.

#include <stdint.h>

struct i128 {
    uint64_t hi;
    uint64_t lo;
};

// Type your code here, or load an example.
struct i128 square(uint64_t x, uint64_t y) {
    __int128 prod = (__int128)x * (__int128)y;
    struct i128 z;
    z.hi = (uint64_t)(prod >> 64);
    z.lo = (uint64_t)prod;
    return z;
}

If the compiled code has call __multi3, then the 128-bit multiplication is emulated by GCC. Otherwise, there is native platform support. Architectures with 128-bit support include:

  • x86_64 (Supported via MUL).
  • mips64 (Supported via DMULTU, which HI and LO can be read-from).
  • s390x (Supported via MLGR).

And architectures where 64-bit limbs are more efficient than 32-bit limbs are:

  • aarch64 (Requires UMULH and MUL to capture high and low bits).
  • powerpc64 (Requires MULHDU and MULLD to capture high and low bits).
  • riscv64 (Requires MUL and MULH to capture high and low bits).

Caveats

This might be slower on 32-bit architectures or those without native 64-bit multiplication support. However, most modern architectures have native 64-bit or 128-bit multiplication.

Proof of Concept Code

A working implementation for the big integer primitives can be found here. The slow path algorithms can be found here, for positive_digit_comp and negative_digit_comp. Note that this exact implementation hasn't been fully tested, but is a minimal fork from existing lexical, so comparable code has been battle tested from nearly 3 years of use in production by millions of users.

Benchmarks

The benchmarks were run on the following values, using the fast-float-rust implementation integrated into Rust standard:

// Example large, near-halfway value.
const LARGE: &str = "8.988465674311580536566680e307";
// Example long, large, near-halfway value.
const LARGE_LONG: &str = "8.9884656743115805365666807213050294962762414131308158973971342756154045415486693752413698006024096935349884403114202125541629105369684531108613657287705365884742938136589844238179474556051429647415148697857438797685859063890851407391008830874765563025951597582513936655578157348020066364210154316532161708032e307";
// Example denormal, near-halfway value.
const DENORMAL: &str = "8.442911973260991817129021e-309";
// Example of a long, denormal, near-halfway value.
const DENORMAL_LONG: &str = "2.4703282292062327208828439643411068618252990130716238221279284125033775363510437593264991818081799618989828234772285886546332835517796989819938739800539093906315035659515570226392290858392449105184435931802849936536152500319370457678249219365623669863658480757001585769269903706311928279558551332927834338409351978015531246597263579574622766465272827220056374006485499977096599470454020828166226237857393450736339007967761930577506740176324673600968951340535537458516661134223766678604162159680461914467291840300530057530849048765391711386591646239524912623653881879636239373280423891018672348497668235089863388587925628302755995657524455507255189313690836254779186948667994968324049705821028513185451396213837722826145437693412532098591327667236328124999e-324";

These are meant to represent floats that use positive_digit_comp or negative_digit_comp.

The microbenchmark results are as follows:

core/large              time:   [13.443 us 13.464 us 13.487 us]
core/large_long         time:   [4.7838 us 4.8724 us 4.9711 us]
core/denormal           time:   [65.280 ns 65.744 ns 66.180 ns]
core/denormal_long      time:   [36.856 us 36.905 us 36.960 us]

lexical/large           time:   [371.46 ns 374.19 ns 378.37 ns]
lexical/large_long      time:   [1.1639 us 1.1745 us 1.1889 us]
lexical/denormal        time:   [55.191 ns 55.497 ns 55.901 ns]
lexical/denormal_long   time:   [799.36 ns 818.20 ns 836.18 ns]

As you can see, the big integer algorithm outperforms the decimal implementation in every case, and performs exceptionally well in a few notable edge-cases. This was tested on x86_64, so benchmarks on other architectures may be required... (specifically, 32-bit architectures and ARM-64, which may have slightly less-efficient use of 64-bit limbs).

License

I own all the copyright to the aforementioned code, and am happy to provide it, in a PR, under any license terms, including public domain. So, no licensing issues exist.

Optimizations for Big-Endian Systems

Issue

Currently, a few optimizations are made only for little-endian systems, similar to aldanor/fast-float-rust#26.

#if FASTFLOAT_IS_BIG_ENDIAN == 0
    // Fast approach only tested under little endian systems
    if ((p + 8 <= pend) && is_made_of_eight_digits_fast(p)) {
      i = i * 100000000 + parse_eight_digits_unrolled(p); // in rare cases, this will overflow, but that's ok
      p += 8;
      if ((p + 8 <= pend) && is_made_of_eight_digits_fast(p)) {
        i = i * 100000000 + parse_eight_digits_unrolled(p); // in rare cases, this will overflow, but that's ok
        p += 8;
      }
    }
#endif

Solution

However, this is trivial to fix since we can load the number to a little-endian byte-order even on big-endian systems. All the code here assumes little-endian order for the number parsed, but this requires at most a byteswap on big-endian systems. A trivial fix is the following:

uint64_t byteswap(uint64_t val) {
  return (val & 0xFF00000000000000) >> 56
    | (val & 0x00FF000000000000) >> 40
    | (val & 0x0000FF0000000000) >> 24
    | (val & 0x000000FF00000000) >> 8
    | (val & 0x00000000FF000000) << 8
    | (val & 0x0000000000FF0000) << 24
    | (val & 0x000000000000FF00) << 40
    | (val & 0x00000000000000FF) << 56;
}

This compiles down to a single bswap instruction:

byteswap(unsigned long):
        mov     rax, rdi
        bswap   rax
        ret

We can now wrap this byteswap into a read and write function:

fastfloat_really_inline uint64_t read_u64(const char *chars) {
  uint64_t val;
  ::memcpy(&val, chars, sizeof(uint64_t));
#if FASTFLOAT_IS_BIG_ENDIAN == 1
  // Need to read as-if the number was in little-endian order.
  val = byteswap(val);
#endif
  return val;
}

fastfloat_really_inline void write_u64(uint8_t *chars, uint64_t val) {
#if FASTFLOAT_IS_BIG_ENDIAN == 1
  // Need to read as-if the number was in little-endian order.
  val = byteswap(val);
#endif
  ::memcpy(chars, &val, sizeof(uint64_t));
}

For example, given the bytes "12345678", on little-endian systems, this is loaded to the uint64_t 0x3837363534333231 via memcpy. Our byteswap on read_u64 confirms will also occur on big-endian systems.

The following 2 functions, which are the only required functions for parsing the bytes to digits, does not depend on the byte-order, but assumes the value was loaded in little-endian order:

// credit  @aqrit
fastfloat_really_inline uint32_t  parse_eight_digits_unrolled(uint64_t val) {
  const uint64_t mask = 0x000000FF000000FF;
  const uint64_t mul1 = 0x000F424000000064; // 100 + (1000000ULL << 32)
  const uint64_t mul2 = 0x0000271000000001; // 1 + (10000ULL << 32)
  val -= 0x3030303030303030;
  val = (val * 10) + (val >> 8); // val = (val * 2561) >> 8;
  val = (((val & mask) * mul1) + (((val >> 16) & mask) * mul2)) >> 32;
  return uint32_t(val);
}
// credit @aqrit
fastfloat_really_inline bool is_made_of_eight_digits_fast(uint64_t val)  noexcept  {
  return !((((val + 0x4646464646464646) | (val - 0x3030303030303030)) &
     0x8080808080808080));
}

Therefore, the parse_eight_digits_unrolled(0x3837363534333231) and is_made_of_eight_digits_fast(0x3837363534333231) will give the same output on any architecture.

The last part is this:

-#if FASTFLOAT_IS_BIG_ENDIAN == 0
     // We expect that this loop will often take the bulk of the running time
     // because when a value has lots of digits, these digits often
     while ((p + 8 <= pend) && (answer.num_digits + 8 < max_digits)) {
-      uint64_t val;
-      ::memcpy(&val, p, sizeof(uint64_t));
+      uint64_t val = read_u64(p);
       if(! is_made_of_eight_digits_fast(val)) { break; }
       // We have eight digits, process them in one go!
       val -= 0x3030303030303030;
-      ::memcpy(answer.digits + answer.num_digits, &val, sizeof(uint64_t));
+      write_u64(answer.digits + answer.num_digits, val);
       answer.num_digits += 8;
       p += 8;
     }
-#endif

Since we load the bytes in little-endian order, and then write them out from little-endian order, this will produce the same result no matter the byte-order. Although this technically doesn't need a byteswap (it would just have a different value for val), this simplifies the resulting logic.

Diff

The entire diff is the following:

diff --git a/include/fast_float/ascii_number.h b/include/fast_float/ascii_number.h
index 8ca345c..cc8ff4f 100644
--- a/include/fast_float/ascii_number.h
+++ b/include/fast_float/ascii_number.h
@@ -14,6 +14,34 @@ namespace fast_float {
 // able to optimize it well.
 fastfloat_really_inline bool is_integer(char c)  noexcept  { return c >= '0' && c <= '9'; }

+fastfloat_really_inline uint64_t byteswap(uint64_t val) {
+  return (val & 0xFF00000000000000) >> 56
+    | (val & 0x00FF000000000000) >> 40
+    | (val & 0x0000FF0000000000) >> 24
+    | (val & 0x000000FF00000000) >> 8
+    | (val & 0x00000000FF000000) << 8
+    | (val & 0x0000000000FF0000) << 24
+    | (val & 0x000000000000FF00) << 40
+    | (val & 0x00000000000000FF) << 56;
+}
+
+fastfloat_really_inline uint64_t read_u64(const char *chars) {
+  uint64_t val;
+  ::memcpy(&val, chars, sizeof(uint64_t));
+#if FASTFLOAT_IS_BIG_ENDIAN == 1
+  // Need to read as-if the number was in little-endian order.
+  val = byteswap(val);
+#endif
+  return val;
+}
+
+fastfloat_really_inline void write_u64(uint8_t *chars, uint64_t val) {
+#if FASTFLOAT_IS_BIG_ENDIAN == 1
+  // Need to read as-if the number was in little-endian order.
+  val = byteswap(val);
+#endif
+  ::memcpy(chars, &val, sizeof(uint64_t));
+}

 // credit  @aqrit
 fastfloat_really_inline uint32_t  parse_eight_digits_unrolled(uint64_t val) {
@@ -27,21 +55,17 @@ fastfloat_really_inline uint32_t  parse_eight_digits_unrolled(uint64_t val) {
 }

 fastfloat_really_inline uint32_t parse_eight_digits_unrolled(const char *chars)  noexcept  {
-  uint64_t val;
-  ::memcpy(&val, chars, sizeof(uint64_t));
-  return parse_eight_digits_unrolled(val);
+  return parse_eight_digits_unrolled(read_u64(chars));
 }

 // credit @aqrit
 fastfloat_really_inline bool is_made_of_eight_digits_fast(uint64_t val)  noexcept  {
   return !((((val + 0x4646464646464646) | (val - 0x3030303030303030)) &
     0x8080808080808080));
 }

 fastfloat_really_inline bool is_made_of_eight_digits_fast(const char *chars)  noexcept  {
-  uint64_t val;
-  ::memcpy(&val, chars, 8);
-  return is_made_of_eight_digits_fast(val);
+  return is_made_of_eight_digits_fast(read_u64(chars));
 }

 struct parsed_number_string {
@@ -87,17 +111,15 @@ parsed_number_string parse_number_string(const char *p, const char *pend, chars_
   int64_t exponent = 0;
   if ((p != pend) && (*p == '.')) {
     ++p;
-#if FASTFLOAT_IS_BIG_ENDIAN == 0
-    // Fast approach only tested under little endian systems
+  // Fast approach only tested under little endian systems
+  if ((p + 8 <= pend) && is_made_of_eight_digits_fast(p)) {
+    i = i * 100000000 + parse_eight_digits_unrolled(p); // in rare cases, this will overflow, but that's ok
+    p += 8;
     if ((p + 8 <= pend) && is_made_of_eight_digits_fast(p)) {
       i = i * 100000000 + parse_eight_digits_unrolled(p); // in rare cases, this will overflow, but that's ok
       p += 8;
-      if ((p + 8 <= pend) && is_made_of_eight_digits_fast(p)) {
-        i = i * 100000000 + parse_eight_digits_unrolled(p); // in rare cases, this will overflow, but that's ok
-        p += 8;
-      }
     }
-#endif
+  }
     while ((p != pend) && is_integer(*p)) {
       uint8_t digit = uint8_t(*p - '0');
       ++p;
@@ -225,20 +247,17 @@ fastfloat_really_inline decimal parse_decimal(const char *p, const char *pend) n
        ++p;
       }
     }
-#if FASTFLOAT_IS_BIG_ENDIAN == 0
     // We expect that this loop will often take the bulk of the running time
     // because when a value has lots of digits, these digits often
     while ((p + 8 <= pend) && (answer.num_digits + 8 < max_digits)) {
-      uint64_t val;
-      ::memcpy(&val, p, sizeof(uint64_t));
+      uint64_t val = read_u64(p);
       if(! is_made_of_eight_digits_fast(val)) { break; }
       // We have eight digits, process them in one go!
       val -= 0x3030303030303030;
-      ::memcpy(answer.digits + answer.num_digits, &val, sizeof(uint64_t));
+      write_u64(answer.digits + answer.num_digits, val);
       answer.num_digits += 8;
       p += 8;
     }
-#endif
     while ((p != pend) && is_integer(*p)) {
       if (answer.num_digits < max_digits) {
         answer.digits[answer.num_digits] = uint8_t(*p - '0');

Update include in README example

I'm pretty sure the include from the example in your README is incorrect. It has #include "fast_float/parse_number.h", when it should be #include "fast_float/fast_float.h". In your benchmarking code you have the second include
https://github.com/lemire/simple_fastfloat_benchmark/blob/a91ca26fa8b991593eb1c601d3232b6dfed72ff8/benchmarks/benchmark.cpp#L4
and also fast_float.h is not included by any other file and is where the from_chars is declared.
Patch for this fix is below. Thanks so much for the library, it's really useful!

diff --git a/README.md b/README.md
index 13239a5..24c307b 100644
--- a/README.md
+++ b/README.md
@@ -44,7 +44,7 @@ It will parse infinity and nan values.
 Example:
 
 ``` C++
-#include "fast_float/parse_number.h"
+#include "fast_float/fast_float.h"
 #include <iostream>
  
 int main() {

Take the `fast_float/` directory prefix out of same-dir includes

This patch lets me #include the fast_float.h file without having to pass an -I includeDir flag to my C++ compiler., as if it was a single-file header-only library.

It also makes fast_float.h more consistent with the other same-dir includes in e.g. decimal_to_binary.h and parse_number.h.

diff --git a/include/fast_float/fast_float.h b/include/fast_float/fast_float.h
index 6bf35b7..214c251 100644
--- a/include/fast_float/fast_float.h
+++ b/include/fast_float/fast_float.h
@@ -41,5 +41,5 @@ from_chars_result from_chars(const char *first, const char *last,
                              T &value, chars_format fmt = chars_format::general)  noexcept;
 
 }
-#include "fast_float/parse_number.h"
-#endif // FASTFLOAT_FAST_FLOAT_H
\ No newline at end of file
+#include "parse_number.h"
+#endif // FASTFLOAT_FAST_FLOAT_H

Compilation warning/error with gcc 6.3.0

We just got this after bumping our vendored copy to the latest git master (052975d):

In file included from /arrow/cpp/src/arrow/vendored/fast_float/parse_number.h:6:0,
                 from /arrow/cpp/src/arrow/vendored/fast_float/fast_float.h:65,
                 from /arrow/cpp/src/arrow/util/value_parsing.cc:23:
/arrow/cpp/src/arrow/vendored/fast_float/digit_comparison.h: In instantiation of ‘arrow_vendored::fast_float::adjusted_mantissa arrow_vendored::fast_float::to_extended(T) [with T = double]’:
/arrow/cpp/src/arrow/vendored/fast_float/digit_comparison.h:92:37:   required from ‘arrow_vendored::fast_float::adjusted_mantissa arrow_vendored::fast_float::to_extended_halfway(T) [with T = double]’
/arrow/cpp/src/arrow/vendored/fast_float/digit_comparison.h:354:48:   required from ‘arrow_vendored::fast_float::adjusted_mantissa arrow_vendored::fast_float::negative_digit_comp(arrow_vendored::fast_float::bigint&, arrow_vendored::fast_float::adjusted_mantissa, int32_t) [with T = double; int32_t = int]’
/arrow/cpp/src/arrow/vendored/fast_float/digit_comparison.h:418:34:   required from ‘arrow_vendored::fast_float::adjusted_mantissa arrow_vendored::fast_float::digit_comp(arrow_vendored::fast_float::parsed_number_string&, arrow_vendored::fast_float::adjusted_mantissa) [with T = double]’
/arrow/cpp/src/arrow/vendored/fast_float/parse_number.h:107:41:   required from ‘arrow_vendored::fast_float::from_chars_result arrow_vendored::fast_float::from_chars_advanced(const char*, const char*, T&, arrow_vendored::fast_float::parse_options) [with T = double]’
/arrow/cpp/src/arrow/util/value_parsing.cc:40:85:   required from here
/arrow/cpp/src/arrow/vendored/fast_float/digit_comparison.h:62:50: error: right shift count >= width of type [-Werror=shift-count-overflow]
       am.power2 = int32_t((bits & exponent_mask) >> binary_format<T>::mantissa_explicit_bits());
                           ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.