GithubHelp home page GithubHelp logo

Comments (10)

HowardHinnant avatar HowardHinnant commented on May 25, 2024

Unfortunately the API of date is pretty much set in stone as it has now been standardized in C++20.

That being said, there are ways to optimize the performance within the current API. Let's start with this crude benchmark for time_zone::to_sys:

#include "date/tz.h"
#include <chrono>
#include <iostream>

using D = std::chrono::system_clock::duration;
auto tz = date::current_zone();

extern date::sys_time<D> tp;

date::sys_time<D>
convert(date::local_time<D> tp_loc)
{
    return tz->to_sys(tp_loc);
}

int
main()
{
    using namespace std::chrono;
    auto tp_start = system_clock::now();
    auto tp_end = tp_start + 1s;
    auto tp_loc = tz->to_local(tp_start);
    std::uint64_t count = 0;
    while (system_clock::now() < tp_end)
    {
        tp = convert(tp_loc);
        tp_loc += 1s;
        ++count;
    }
    auto rate = count / ((tp_end - tp_start) / 1.s);
    std::cout << rate << " conversions / sec\n";
}

date::sys_time<D> tp;

For me, compiled with clang -O3, this yields about 707,000 conversions/sec. This is significantly slower than what you report, but is a good reference point anyway.

If you can make the assumption that every local time has a unique mapping to UTC (i.e. exists and is not ambiguous), then a fairly simple change can be made to convert which will dramatically improve the performance:

date::sys_time<D>
convert(date::local_time<D> tp_loc)
{
    thread_local date::local_info info = tz->get_info(tp_loc);
    date::sys_time<D> tp{tp_loc.time_since_epoch() - info.first.offset};
    if (tp < info.first.begin || tp >= info.first.end)
    {
        info = tz->get_info(tp_loc);
        tp = date::sys_time<D>{tp_loc.time_since_epoch() - info.first.offset};
    }
    return tp;
}

This stores a thread_local local_info initialized by the first local time sent to this function. info.first contains two fields begin and end which are the UTC time points over which this information is known to be accurate. And the information includes the UTC offset. So the conversion is made with the offset, and if the result is not in the range [begin, end), then the local_info is updated, and the conversion is repeated.

For me this increases the performance to about 74.5 million conversions/sec (a speed up of about 100X).

There are a couple of caveats:

  1. If successive calls to convert are not "temporally close" on average, then the performance degrades and will ultimately become worse than the un-optimized convert. This optimization hinges on the guess that the next call to convert will be able to use the same local_info as the last call.
  2. If the local time does not have a unique mapping to UTC (because of a UTC offset change happening in the time zone), then this optimization will silently get the wrong answer. The un-optimized convert would have thrown an exception in this situation. If desired, more complicated logic can be added to protect against this. But more complicated logic will also be slower.

This optimization, if it is acceptable in your application, has the potential to be even faster than the string_view optimization you propose (100X vs 60X).

from date.

zeodtr avatar zeodtr commented on May 25, 2024

@HowardHinnant Thank you for your detailed response.
I apologize for not being specific about my test. it used 40 threads. So, it could be way faster.
Regarding optimization, since my system is a general-purpose OLAP DBMS, it can't make any assumptions.
Here's another idea: given that to_sys() doesn't necessarily require calling get_info() or load_sys_info(), one could introduce another set of internal functions, perhaps 'casually' named get_info_casual() and load_sys_info_casual(), which does not copy abbrev into sys_info.
In my opinion, abbrev is seldom useful beyond display purposes, and considering its high retrieval cost, it would be beneficial if 'casual' versions were available in the standard library.
Thank you.

from date.

HowardHinnant avatar HowardHinnant commented on May 25, 2024

There's something else going on here that I'm not completely understanding. abbrev is typically a 3 or 4 character string. And every implementation of string I'm aware of has a short-string buffer much larger than that. And this would make copying abbrev very fast. There should be no allocation/deallocation. Just a memcpy of 4 to 5 char.

Would you mind telling me what compiler/version you're using? That might be important.

from date.

zeodtr avatar zeodtr commented on May 25, 2024

The compiler version info is as follows:

g++ (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Given the size of the code base for the OLAP DBMS and the complexity of the query execution path, there could potentially be another factor contributing to this symptom. If that's the case, I apologize for any oversight.

from date.

zeodtr avatar zeodtr commented on May 25, 2024

The OS info is as follows:

CentOS Linux release 7.9.2009 (Core)

The OS might be too outdated to support the C++11 ABI.
References:
https://stackoverflow.com/questions/46068529/no-small-string-optimization-with-gcc
https://bugzilla.redhat.com/show_bug.cgi?id=1546704

from date.

HowardHinnant avatar HowardHinnant commented on May 25, 2024

Ah! Do you know if you're using the old reference-counted string? Not sure, but that is probably --with-default-libstdcxx-abi=gcc4-compatible. Still copying a ref-counted string is fast too. Unless maybe there is a lot of multi-thread contention on the ref-count.

Are you using the date lib flag -DUSE_OS_TZDB=1? (https://howardhinnant.github.io/date/tz.html#Installation). This tells the date lib to use the binary IANA database that ships on your OS. This configuration is known to have higher performance than using a downloaded text IANA database (-DUSE_OS_TZDB=0).

from date.

HowardHinnant avatar HowardHinnant commented on May 25, 2024

Aside: Once you hit g++ 14 (which hasn't yet shipped), you'll be able to migrate off of this lib and use C++20 chrono. So whatever we do here, staying compatible with C++20 means you've got a path forward in the future.

from date.

zeodtr avatar zeodtr commented on May 25, 2024

I use -DUSE_OS_TXDB.
I think I'll continue with the std::string_view solution for the time being. It doesn't necessitate altering the source code utilizing sys_info, making it practically compatible with the standard. Furthermore, it avoids the need for changes that could potentially impact other sections of the source code, like compile options. Additionally, I can manage that portion of the source code myself.

By the way, I ran my test on a CentOS 8 system a few minutes ago. It was fast without std::string_view workaround.

Thank you for your great and kind responses!

from date.

HowardHinnant avatar HowardHinnant commented on May 25, 2024

Interesting! So can we conclude that the move from a ref-counted string to a short-short string optimization is the performance difference?

from date.

zeodtr avatar zeodtr commented on May 25, 2024

Yes, I think the 'short string optimization' is the key.

from date.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.