Comments (22)
For anyone needing Sharding + Replication, I adjusted the schema and it works perfectly:
https://gist.github.com/coelho/c3b7bbb2c95caa61115d93692f9e4ae2
@lmangani Feel free to use as reference to add support :)
from qryn.
@smmstf how do you create the cluster? Can you give a sample of the config.xml part ?
from qryn.
Here's the configuration I use, I have 1 shard - 3 replicas:
<?xml version="1.0"?>
<yandex>
<remote_servers>
<cloki_cluster>
<shard>
<replica>
<default_database>test</default_database>
<user>xxxxx</user>
<password>xxxxx</password>
<host>172.30.102.243</host>
<port>9000</port>
<!-- <secure>1</secure> -->
</replica>
<replica>
<default_database>test</default_database>
<user>xxxxx</user>
<password>xxxx</password>
<host>172.30.103.104</host>
<port>9000</port>
<!-- <secure>1</secure> -->
</replica>
<replica>
<default_database>test</default_database>
<user>xxxxx</user>
<password>xxxx</password>
<host>172.30.103.105</host>
<port>9000</port>
<!-- <secure>1</secure> -->
</replica>
</shard>
</cloki_cluster>
</remote_servers>
</yandex>
I think my problem is in the creation of the Replicated tables :)
from qryn.
@smmstf In order Replicated* tables to start working you need clickhouse keeper to be configured. Do you have it? https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper/
from qryn.
Yes i already have it configured, here's my config file:
<?xml version="1.0" ?>
<yandex>
<keeper_server>
<!-- <tcp_port_secure>2181</tcp_port_secure> -->
<tcp_port>2181</tcp_port>
<server_id>1</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<coordination_settings>
<operation_timeout_ms>10000</operation_timeout_ms>
<session_timeout_ms>30000</session_timeout_ms>
<raft_logs_level>trace</raft_logs_level>
<rotate_log_storage_interval>10000</rotate_log_storage_interval>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>172.30.102.243</hostname>
<port>9444</port>
</server>
<server>
<id>2</id>
<hostname>172.30.103.104</hostname>
<port>9444</port>
</server>
<server>
<id>3</id>
<hostname>172.30.103.105</hostname>
<port>9444</port>
</server>
</raft_configuration>
</keeper_server>
<zookeeper>
<node>
<host>172.30.102.243</host>
<port>2181</port>
<!-- <secure>1</secure> -->
</node>
<node>
<host>172.30.103.104</host>
<port>2181</port>
<!-- <secure>1</secure> -->
</node>
<node>
<host>172.30.103.105</host>
<port>2181</port>
<!-- <secure>1</secure> -->
</node>
</zookeeper>
<!--
<distributed_ddl>
<path>/clickhouse/cloki_cluster/task_queue/ddl</path>
</distributed_ddl>
-->
</yandex>
And it works correctly
from qryn.
Ok. I reproduced the bug.
The configuration of clickhouse / keeper cluster is unbearably complicated conteining completely non-obvious steps. So you should recheck everything carefully. Here are steps I have spent a lot of time to figure out.
- (pretty obvious) all servers should freely communicate through ports 2181 and 9444. So you can check if both ports are reachable with telnet.
- If you use hostnames instead of IPs like that (I have managed to configure with hostnames only):
<zookeeper>
<node>
<host>clickhouse_1</host>
<port>2181</port>
<secure>0</secure>
</node>
<node>
<host>clickhouse_2</host>
<port>2181</port>
<secure>0</secure>
</node>
</zookeeper>
Then you should provide hostname.hostname
alias for all the nodes. I provided clickhouse_1.clickhouse_1
and clickhouse_2.clickhouse_2
. I have no idea why they need it but anyway.
3. <secure>0</secure>
is mandatory for each node of clickhouse keeper. Or you should provide an rsa key pair https://clickhouse.com/docs/en/operations/ssl-zookeeper/
4. Macros and default paths. For me the next config worked
<macros>
<shard>01</shard>
<replica>example01-01-1</replica>
</macros>
<default_replica_path>/clickhouse/tables/{shard}/{database}/{table}</default_replica_path>
<default_replica_name>{replica}</default_replica_name>
<replica>example01-01-1</replica>
part should wary between servers.
5. Pretty simple check to ensure if all the keepers are configured and work ok in clickhouse client:
set insert_quorum=3
INSERT INTO samples_v3 (fingerprint, timestamp_ns, string) VALUES (1, 1656583893000000000, 'str');
- And of course if clickhouse keeper doesn't work it may write some errors in
/var/log/clickhouse-server/clickhouse-server.log
. But that's optional. It may be absent silently. :) - Clickhouse version. I have tested with this: https://hub.docker.com/layers/clickhouse-server/clickhouse/clickhouse-server/22.6.2.12/images/sha256-23f45eddbd72befa0942660b31f76b3b1aa45b66512dc19ab173c76615528ae0?context=explore Make sure you are upgraded to the latest version
Clickhouse keeper & Replicated* table engine features are very rare. Let's hope they will become a bit less magical in future.
from qryn.
@coelho Doesn't it show some errors about subrequests in distributed tables? Or maybe distributed table requests inside CTE? Did you change something in the clickhouse config.xml file?
from qryn.
@akvlad Completely forgot to include that. You're right - I edited my gist.
+ // NOTE: You also need to set "distributed_product_mode" to "global" in your profile.
+ // https://clickhouse.com/docs/en/operations/settings/settings-profiles/
from qryn.
@coelho thanks for the gist. I think it can be included into our wiki as a recipe for clustering.
from qryn.
@coelho I created a Clickhouse cluster with 3 slices and 2 copies, and used "https://clickhouse.com/docs/en/operations/settings/settings-profiles/" The database and data table are created, but qryn doesn't work normally, prompting an error of 500
from qryn.
@ktpktr0 Can you show log output + test the tables in your Clickhouse?
for ex. for each table: SELECT count() FROM time_series
from qryn.
vector_aggregate log:
2022-08-01T13:49:44.710447Z INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info,buffers=info,kube=info"
2022-08-01T13:49:44.710517Z INFO vector::app: Loading configs. paths=["/etc/vector/vector_aggregate1.toml"]
2022-08-01T13:49:44.723816Z INFO vector::topology::running: Running healthchecks.
2022-08-01T13:49:44.724084Z INFO vector: Vector has started. debug="false" version="0.23.0" arch="x86_64" build_id="38c2435 2022-07-11"
2022-08-01T13:49:44.724097Z INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
2022-08-01T13:49:44.750187Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=A non-successful status returned: 500 Internal Server Error component_kind="sink" component_type="loki" component_id=out component_name=out
qryn log:
{"level":30,"time":1659361774106,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"Initializing DB... cloki"}
(node:55684) [FST_MODULE_DEP_FASTIFY-WEBSOCKET] FastifyWarning.fastify-websocket: fastify-websocket has been deprecated. Use @fastify/[email protected] instead.
(Use `node --trace-warnings ...` to show where the warning was created)
(node:55684) [FST_MODULE_DEP_FASTIFY-CORS] FastifyWarning.fastify-cors: fastify-cors has been deprecated. Use @fastify/[email protected] instead.
(node:55684) [FST_MODULE_DEP_FASTIFY-STATIC] FastifyWarning.fastify-static: fastify-static has been deprecated. Use @fastify/[email protected] instead.
{"level":30,"time":1659361774385,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"Server listening at http://0.0.0.0:3100"}
{"level":30,"time":1659361774385,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"Qryn API up"}
{"level":30,"time":1659361774385,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"Qryn API listening on http://0.0.0.0:3100"}
{"level":30,"time":1659361774396,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"xxh ready"}
{"level":30,"time":1659361774407,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"Checking clickhouse capabilities"}
{"level":30,"time":1659361774410,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"LIVE VIEW: supported"}
{"level":30,"time":1659361774416,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"checking old samples support: samples_v2"}
{"level":30,"time":1659361774422,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"checking old samples support: samples"}
{"level":30,"time":1659361774486,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"xxh ready"}
{"level":30,"time":1659361783575,"pid":55684,"hostname":"k8s-master2","name":"qryn","reqId":"req-1","req":{"method":"GET","url":"/ready","hostname":"cloki","remoteAddress":"192.168.10.160","remotePort":34850},"msg":"incoming request"}
{"level":50,"time":1659361783585,"pid":55684,"hostname":"k8s-master2","name":"qryn","reqId":"req-1","err":"Clickhouse DB not ready\nError: Clickhouse DB not ready\n at Object.handler (/usr/lib/node_modules/qryn/lib/handlers/ready.js:15:14)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)","msg":"Clickhouse DB not ready"}
{"level":30,"time":1659361783587,"pid":55684,"hostname":"k8s-master2","name":"qryn","reqId":"req-1","res":{"statusCode":500},"responseTime":10.706425994634628,"msg":"request completed"}
clickhouse server log:
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xba37dda in /usr/bin/clickhouse
1. DB::readException(DB::ReadBuffer&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool) @ 0xbab199e in /usr/bin/clickhouse
2. DB::Connection::receiveException() const @ 0x17854161 in /usr/bin/clickhouse
3. DB::Connection::receiveHello() @ 0x17853b8c in /usr/bin/clickhouse
4. DB::Connection::connect(DB::ConnectionTimeouts const&) @ 0x1785247a in /usr/bin/clickhouse
5. DB::Connection::forceConnected(DB::ConnectionTimeouts const&) @ 0x17854e44 in /usr/bin/clickhouse
6. DB::ConnectionEstablisher::run(PoolWithFailoverBase<DB::IConnectionPool>::TryResult&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&) @ 0x1787508b in /usr/bin/clickhouse
7. ? @ 0x1786fad6 in /usr/bin/clickhouse
8. PoolWithFailoverBase<DB::IConnectionPool>::getMany(unsigned long, unsigned long, unsigned long, unsigned long, bool, std::__1::function<PoolWithFailoverBase<DB::IConnectionPool>::TryResult (DB::IConnectionPool&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&)> const&, std::__1::function<unsigned long (unsigned long)> const&) @ 0x1786dd13 in /usr/bin/clickhouse
9. PoolWithFailoverBase<DB::IConnectionPool>::get(unsigned long, bool, std::__1::function<PoolWithFailoverBase<DB::IConnectionPool>::TryResult (DB::IConnectionPool&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&)> const&, std::__1::function<unsigned long (unsigned long)> const&) @ 0x1786c45f in /usr/bin/clickhouse
10. DB::ConnectionPoolWithFailover::get(DB::ConnectionTimeouts const&, DB::Settings const*, bool) @ 0x1786c261 in /usr/bin/clickhouse
11. DB::StorageDistributedDirectoryMonitor::processFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x1747a84e in /usr/bin/clickhouse
12. DB::StorageDistributedDirectoryMonitor::run() @ 0x17475c28 in /usr/bin/clickhouse
13. DB::BackgroundSchedulePoolTaskInfo::execute() @ 0x15e29f7d in /usr/bin/clickhouse
14. DB::BackgroundSchedulePool::threadFunction() @ 0x15e2d076 in /usr/bin/clickhouse
15. ? @ 0x15e2df2e in /usr/bin/clickhouse
16. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xbb046a8 in /usr/bin/clickhouse
17. ? @ 0xbb07a3d in /usr/bin/clickhouse
18. start_thread @ 0x81cf in /usr/lib64/libpthread-2.28.so
19. __GI___clone @ 0x39d83 in /usr/lib64/libc-2.28.so
(version 22.7.1.2484 (official build))
2022.08.01 21:52:37.141560 [ 10999 ] {} <Debug> DNSResolver: Updating DNS cache
2022.08.01 21:52:37.141614 [ 10999 ] {} <Debug> DNSResolver: Updated DNS cache
from qryn.
@ktpktr0 Clickhouse log is incomplete, but it sounds like your Clickhouse cluster / replication just isn't setup properly.
You should validate your Clickhouse with manual SQL queries (ex. confirm Distributed/Replicated tables are working properly)
from qryn.
I show my cluster configuration here ClickHouse/ClickHouse#39767
from qryn.
@ktpktr0 You need to manually run SQL queries on the tables to debug the error.
- You need to make sure the tables were created on all replicas + shards.
ON CLUSTER
is used for this: https://clickhouse.com/docs/en/sql-reference/distributed-ddl/ - You will need to adjust my schema to work with your config. I am using macros like
{cluster}
that you do not have.
Anyway - This is likely not the place to debug a Clickhouse install.
from qryn.
I tried the statement here, and it works normally #188
from qryn.
@ktpktr0 #188 is a broken table schema with the current qryn as you'll have incomplete data.
from qryn.
I am a novice of Clickhouse and qryn. I want to put each component cluster mode into the production environment.
from qryn.
@coelho Can you share the configuration of Clickhouse? Thank you
from qryn.
Thanks @coelho for all the contributed help. Converging the topic here: https://github.com/metrico/qryn/wiki/qryn-tables-replication-support
from qryn.
@coelho apologies if this is not the best place to ask, but I am curious why you chose to use the fingerprint as the sharding key instead of something like rand()
? While experimenting with the schema I noticed my shards were getting unbalanced because of one type of logs that all have identical labels.
For anyone needing Sharding + Replication, I adjusted the schema and it works perfectly: https://gist.github.com/coelho/c3b7bbb2c95caa61115d93692f9e4ae2
@lmangani Feel free to use as reference to add support :)
from qryn.
@raiford thanks for this interesting point, please feel free to submit any suggestion or open an issue dedicated to this topic, ideally providing everyone with the best options by default
from qryn.
Related Issues (20)
- Pyroscope qryn cant select by multiple labels HOT 2
- Error (memory access out of bounds). Please check the server logs for more details. HOT 11
- Pyroscope ProfilesTypes returns wrong format
- Post to /telegraf api produce an error HOT 3
- Post to /influxapi produce an error HOT 15
- Broken json filters HOT 10
- Pyroscope metrics group by doesnt work
- Pyroscope flame graph error "Cannot read properties of: undefined (reading 'fields')" HOT 1
- Feature Request: Trino support HOT 1
- Error with npm in qryn 3.2.9 HOT 2
- Booleans parsed from environment in utterly confusing ways HOT 3
- Tempo request raises out of bound error HOT 1
- Tempo request raises out of bound error HOT 2
- The use of JSON struct in clickhouse results in high storage space consumption HOT 15
- TypeError: undefined is not an object (evaluating 'clickhouseOptions.queryOptions') HOT 3
- Error "Bad Request: invalid request" with large body request to qryn HOT 3
- Out of bound error in /api/v1/query_range HOT 4
- Feature: Query Log Volume API
- Feature: Patterns Detection API
- [bug] RangeError: Too many properties to enumerate HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qryn.