Comments (9)
from clickhouse.
That's what the input_rows_count
argument is for
from clickhouse.
Fix proposal:
initialize the library once (something like singleton in CH process, initialized on demand). Under global mutex to prevent parallel initialization.
generate ULIDs under the same mutex. ULID generation involves state change, so it's not thread safe.
I'd expect this to have the opposite results. Instead of having N threads generating ULID's you would have only one at a time.
I did not do benchmarks, but I'm sure that even with a lock, a massive insert() will be faster than current implementation that calls SYS_getrandom() every time + some math around.
Please do and let's analyze the results in a multi-threaded environment. Do you really expect a call to random() and a small loop to be that expensive per block?
from clickhouse.
Here is the perf of running generateUUID on a single thread (Select generateULID() from numbers(1000000000) format Null
).
+ 97.51% 13.09% QueryPullP clickhouse [.] ulid_generate
+ 75.83% 75.82% QueryPullP clickhouse [.] ulid_encode
0.07% 0.07% QueryPullP clickhouse [.] ulid_generator_init
0.07% of the time is spent in ulid_generator_init. 97.51% in ulid_generate.
from clickhouse.
Thanks. I confirm this.
CREATE TABLE qwe
(
`id` FixedString(26) DEFAULT generateULID(),
`x` Int32
)
ENGINE = MergeTree
ORDER BY id;
insert into qwe(x) select number from numbers(10000);
strace -ff -p $(pidof clickhouse-server) -e getrandom
shows only few calls of getrandom()
. It means ulid_generator_init()
is not called every time ULID is generated.
from clickhouse.
@Algunenano although:
This calls getrandom()
8 times.
select generateULID(1), generateULID(2),generateULID(3), generateULID(4), generateULID(5), generateULID(6), generateULID(7), generateULID(8);
This calls getrandom()
4 times.
select groupArray(generateULID(number)) from numbers(1000) format Null
That's strange. But anyway, this peculiarity is not the case for me.
from clickhouse.
@socketpair arrayMap( x-> generateULID(x), range(1000))
from clickhouse.
@den-crane I did not catch what you want to emphasize, but getting coarse time on x86_64 Linux platform is exceptionally fast. And in fact, its performance is enough (for me).
But on macosx or bsd.... not my case.
also
select arrayMap( x-> generateULID(x), range(1000));
calls getrandom()
5 times.
from clickhouse.
This calls getrandom() 8 times.
select generateULID(1), generateULID(2),generateULID(3), generateULID(4), generateULID(5), generateULID(6), generateULID(7), generateULID(8);
The explanation is that in this case the argument is used to prevent the functions to be eliminated, so they are treated as 8 different functions.
This calls getrandom() 4 times.
select groupArray(generateULID(number)) from numbers(1000) format Null
I'm curious about why. I would expect one call to getrandom() to happen in the dry-run check (when CH internally checks the function output types and so on) and one per each block (1 call to ulid_generator_init, 1000 calls to ulid_generate). So in this case just 2 getrandom() calls. I'm probably missing something
One way you could do to check the impact of using blocks is to run something like select groupArray(generateULID(number)) from numbers(1000) format Null SETTINGS max_block_size=1
). This will generate blocks of just 1 row, with each calling execute() (so 1 call to ulid_generator_init + 1 call to ulid_generate).
from clickhouse.
Related Issues (20)
- Null pointer in DB::MarksInCompressedFile::get in 24.2.3.70 HOT 1
- AMBIGUOUS_COLUMN_NAME when reading from MV with mismatching schema
- ClickHouse >= 24.4 rejects certain valid queries that were previously accepted HOT 2
- formatDateTimeInJodaSyntax result error HOT 1
- MaterializedPostgreSQL replication does not work with upper case letters in database name HOT 5
- Dynamic "order by" in parameterized view HOT 1
- Interpolate + remote + new analyzer => Unknown expression identifier
- Write to index uncompressed cache HOT 1
- CANNOT_PARSE_INPUT_ASSERTION_FAILED with Vector timestamp fields HOT 2
- Map + float64 as a key, ILLEGAL_TYPE_OF_ARGUMENT when query an element by key HOT 3
- CgroupsMemoryUsageObserver pessimizes current memory usage HOT 1
- [RFC] Randomly wrap table expressions into `remote` table function in stateless tests HOT 1
- ACL qestion HOT 8
- Create materialized view from AggregatingMergeTree table HOT 3
- Issue with `version_helper.py` and `changelog.py` after a new releasing scheme HOT 2
- Lots of `HTTPSessionPool: No free connections in pool. Waiting 1000 ms.` when using S3 disks HOT 1
- ANN usearch HOT 1
- Further improvements for working with archives in object storages
- Include client's IP into auth error messages in logs
- HTTP external authenticator not work HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clickhouse.