GithubHelp home page GithubHelp logo

Comments (9)

den-crane avatar den-crane commented on June 25, 2024 2

#63577 (comment)

#62852 (comment)

https://github.com/ClickHouse/ClickHouse/pull/62852/files#diff-1437cd250b6dfcc59c09582025ad6c0b37a8d0bfd35da3bfe5f426e8d2e8893dR220

from clickhouse.

Algunenano avatar Algunenano commented on June 25, 2024 1

That's what the input_rows_count argument is for

from clickhouse.

Algunenano avatar Algunenano commented on June 25, 2024

Fix proposal:

initialize the library once (something like singleton in CH process, initialized on demand). Under global mutex to prevent parallel initialization.
generate ULIDs under the same mutex. ULID generation involves state change, so it's not thread safe.

I'd expect this to have the opposite results. Instead of having N threads generating ULID's you would have only one at a time.

I did not do benchmarks, but I'm sure that even with a lock, a massive insert() will be faster than current implementation that calls SYS_getrandom() every time + some math around.

Please do and let's analyze the results in a multi-threaded environment. Do you really expect a call to random() and a small loop to be that expensive per block?

from clickhouse.

Algunenano avatar Algunenano commented on June 25, 2024

Here is the perf of running generateUUID on a single thread (Select generateULID() from numbers(1000000000) format Null).

+   97.51%    13.09%  QueryPullP  clickhouse  [.] ulid_generate
+   75.83%    75.82%  QueryPullP  clickhouse  [.] ulid_encode
     0.07%     0.07%  QueryPullP  clickhouse  [.] ulid_generator_init

0.07% of the time is spent in ulid_generator_init. 97.51% in ulid_generate.

from clickhouse.

socketpair avatar socketpair commented on June 25, 2024

Thanks. I confirm this.

CREATE TABLE qwe
(
    `id` FixedString(26) DEFAULT generateULID(),
    `x` Int32
)
ENGINE = MergeTree
ORDER BY id;

insert into qwe(x) select number from numbers(10000);

strace -ff -p $(pidof clickhouse-server) -e getrandom shows only few calls of getrandom(). It means ulid_generator_init() is not called every time ULID is generated.

from clickhouse.

socketpair avatar socketpair commented on June 25, 2024

@Algunenano although:

This calls getrandom() 8 times.

select generateULID(1), generateULID(2),generateULID(3), generateULID(4), generateULID(5), generateULID(6), generateULID(7), generateULID(8);

This calls getrandom() 4 times.

select groupArray(generateULID(number)) from numbers(1000) format Null

That's strange. But anyway, this peculiarity is not the case for me.

from clickhouse.

den-crane avatar den-crane commented on June 25, 2024

@socketpair arrayMap( x-> generateULID(x), range(1000))

https://github.com/ClickHouse/ulid-c/blob/c433b6783cf918b8f996dacd014cb2b68c7de419/src/ulid.c#L261-L263

from clickhouse.

socketpair avatar socketpair commented on June 25, 2024

@den-crane I did not catch what you want to emphasize, but getting coarse time on x86_64 Linux platform is exceptionally fast. And in fact, its performance is enough (for me).

But on macosx or bsd.... not my case.

also

select arrayMap( x-> generateULID(x), range(1000)); calls getrandom() 5 times.

from clickhouse.

Algunenano avatar Algunenano commented on June 25, 2024

This calls getrandom() 8 times.
select generateULID(1), generateULID(2),generateULID(3), generateULID(4), generateULID(5), generateULID(6), generateULID(7), generateULID(8);

The explanation is that in this case the argument is used to prevent the functions to be eliminated, so they are treated as 8 different functions.

This calls getrandom() 4 times.
select groupArray(generateULID(number)) from numbers(1000) format Null

I'm curious about why. I would expect one call to getrandom() to happen in the dry-run check (when CH internally checks the function output types and so on) and one per each block (1 call to ulid_generator_init, 1000 calls to ulid_generate). So in this case just 2 getrandom() calls. I'm probably missing something

One way you could do to check the impact of using blocks is to run something like select groupArray(generateULID(number)) from numbers(1000) format Null SETTINGS max_block_size=1). This will generate blocks of just 1 row, with each calling execute() (so 1 call to ulid_generator_init + 1 call to ulid_generate).

from clickhouse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.