Hi, Currently, time_zone::to_sys()

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The compiler version info is as follows: <div class="snippet-clipboard-content not

The OS info is as follows: <div class="snippet-clipboard-content notranslate posit

Possibly big performance improvement for `time_zone::to_sys()` about date HOT 10 OPEN

zeodtr commented on May 25, 2024

Possibly big performance improvement for `time_zone::to_sys()`

from date.

Comments (10)

HowardHinnant commented on May 25, 2024

Unfortunately the API of date is pretty much set in stone as it has now been standardized in C++20.

That being said, there are ways to optimize the performance within the current API. Let's start with this crude benchmark for time_zone::to_sys:

#include "date/tz.h"
#include <chrono>
#include <iostream>

using D = std::chrono::system_clock::duration;
auto tz = date::current_zone();

extern date::sys_time<D> tp;

date::sys_time<D>
convert(date::local_time<D> tp_loc)
{
    return tz->to_sys(tp_loc);
}

int
main()
{
    using namespace std::chrono;
    auto tp_start = system_clock::now();
    auto tp_end = tp_start + 1s;
    auto tp_loc = tz->to_local(tp_start);
    std::uint64_t count = 0;
    while (system_clock::now() < tp_end)
    {
        tp = convert(tp_loc);
        tp_loc += 1s;
        ++count;
    }
    auto rate = count / ((tp_end - tp_start) / 1.s);
    std::cout << rate << " conversions / sec\n";
}

date::sys_time<D> tp;

For me, compiled with clang -O3, this yields about 707,000 conversions/sec. This is significantly slower than what you report, but is a good reference point anyway.

If you can make the assumption that every local time has a unique mapping to UTC (i.e. exists and is not ambiguous), then a fairly simple change can be made to convert which will dramatically improve the performance:

date::sys_time<D>
convert(date::local_time<D> tp_loc)
{
    thread_local date::local_info info = tz->get_info(tp_loc);
    date::sys_time<D> tp{tp_loc.time_since_epoch() - info.first.offset};
    if (tp < info.first.begin || tp >= info.first.end)
    {
        info = tz->get_info(tp_loc);
        tp = date::sys_time<D>{tp_loc.time_since_epoch() - info.first.offset};
    }
    return tp;
}

This stores a thread_local local_info initialized by the first local time sent to this function. info.first contains two fields begin and end which are the UTC time points over which this information is known to be accurate. And the information includes the UTC offset. So the conversion is made with the offset, and if the result is not in the range [begin, end), then the local_info is updated, and the conversion is repeated.

For me this increases the performance to about 74.5 million conversions/sec (a speed up of about 100X).

There are a couple of caveats:

If successive calls to convert are not "temporally close" on average, then the performance degrades and will ultimately become worse than the un-optimized convert. This optimization hinges on the guess that the next call to convert will be able to use the same local_info as the last call.
If the local time does not have a unique mapping to UTC (because of a UTC offset change happening in the time zone), then this optimization will silently get the wrong answer. The un-optimized convert would have thrown an exception in this situation. If desired, more complicated logic can be added to protect against this. But more complicated logic will also be slower.

This optimization, if it is acceptable in your application, has the potential to be even faster than the string_view optimization you propose (100X vs 60X).

from date.

zeodtr commented on May 25, 2024

@HowardHinnant Thank you for your detailed response.
I apologize for not being specific about my test. it used 40 threads. So, it could be way faster.
Regarding optimization, since my system is a general-purpose OLAP DBMS, it can't make any assumptions.
Here's another idea: given that to_sys() doesn't necessarily require calling get_info() or load_sys_info(), one could introduce another set of internal functions, perhaps 'casually' named get_info_casual() and load_sys_info_casual(), which does not copy abbrev into sys_info.
In my opinion, abbrev is seldom useful beyond display purposes, and considering its high retrieval cost, it would be beneficial if 'casual' versions were available in the standard library.
Thank you.

from date.

HowardHinnant commented on May 25, 2024

There's something else going on here that I'm not completely understanding. abbrev is typically a 3 or 4 character string. And every implementation of string I'm aware of has a short-string buffer much larger than that. And this would make copying abbrev very fast. There should be no allocation/deallocation. Just a memcpy of 4 to 5 char.

Would you mind telling me what compiler/version you're using? That might be important.

from date.

zeodtr commented on May 25, 2024

The compiler version info is as follows:

g++ (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Given the size of the code base for the OLAP DBMS and the complexity of the query execution path, there could potentially be another factor contributing to this symptom. If that's the case, I apologize for any oversight.

from date.

zeodtr commented on May 25, 2024

The OS info is as follows:

CentOS Linux release 7.9.2009 (Core)

The OS might be too outdated to support the C++11 ABI.
References:
https://stackoverflow.com/questions/46068529/no-small-string-optimization-with-gcc
https://bugzilla.redhat.com/show_bug.cgi?id=1546704

from date.

HowardHinnant commented on May 25, 2024

Ah! Do you know if you're using the old reference-counted string? Not sure, but that is probably --with-default-libstdcxx-abi=gcc4-compatible. Still copying a ref-counted string is fast too. Unless maybe there is a lot of multi-thread contention on the ref-count.

Are you using the date lib flag -DUSE_OS_TZDB=1? (https://howardhinnant.github.io/date/tz.html#Installation). This tells the date lib to use the binary IANA database that ships on your OS. This configuration is known to have higher performance than using a downloaded text IANA database (-DUSE_OS_TZDB=0).

from date.

HowardHinnant commented on May 25, 2024

Aside: Once you hit g++ 14 (which hasn't yet shipped), you'll be able to migrate off of this lib and use C++20 chrono. So whatever we do here, staying compatible with C++20 means you've got a path forward in the future.

from date.

zeodtr commented on May 25, 2024

I use -DUSE_OS_TXDB.
I think I'll continue with the std::string_view solution for the time being. It doesn't necessitate altering the source code utilizing sys_info, making it practically compatible with the standard. Furthermore, it avoids the need for changes that could potentially impact other sections of the source code, like compile options. Additionally, I can manage that portion of the source code myself.

By the way, I ran my test on a CentOS 8 system a few minutes ago. It was fast without std::string_view workaround.

Thank you for your great and kind responses!

from date.

HowardHinnant commented on May 25, 2024

Interesting! So can we conclude that the move from a ref-counted string to a short-short string optimization is the performance difference?

from date.

zeodtr commented on May 25, 2024

Yes, I think the 'short string optimization' is the key.

from date.

Possibly big performance improvement for `time_zone::to_sys()` about date HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs