GithubHelp home page GithubHelp logo

Cluster/Replication about qryn HOT 22 CLOSED

metrico avatar metrico commented on May 9, 2024
Cluster/Replication

from qryn.

Comments (22)

coelho avatar coelho commented on May 9, 2024 1

For anyone needing Sharding + Replication, I adjusted the schema and it works perfectly:
https://gist.github.com/coelho/c3b7bbb2c95caa61115d93692f9e4ae2

@lmangani Feel free to use as reference to add support :)

from qryn.

akvlad avatar akvlad commented on May 9, 2024

@smmstf how do you create the cluster? Can you give a sample of the config.xml part ?

from qryn.

smmstf avatar smmstf commented on May 9, 2024

Here's the configuration I use, I have 1 shard - 3 replicas:

<?xml version="1.0"?>
<yandex>
    <remote_servers>
        <cloki_cluster>
             <shard>
                <replica>
                    <default_database>test</default_database>
                    <user>xxxxx</user>
                    <password>xxxxx</password>
                    <host>172.30.102.243</host>
                    <port>9000</port>
                    <!-- <secure>1</secure> -->
                </replica>
                <replica>
                    <default_database>test</default_database>
                    <user>xxxxx</user>
                    <password>xxxx</password>
                    <host>172.30.103.104</host>
                    <port>9000</port>
                    <!-- <secure>1</secure> -->
                </replica>
                <replica>
                    <default_database>test</default_database>
                    <user>xxxxx</user>
                    <password>xxxx</password>
                    <host>172.30.103.105</host>
                    <port>9000</port>
                    <!-- <secure>1</secure> -->
                </replica>
            </shard>
        </cloki_cluster>
    </remote_servers>
</yandex>

I think my problem is in the creation of the Replicated tables :)

from qryn.

akvlad avatar akvlad commented on May 9, 2024

@smmstf In order Replicated* tables to start working you need clickhouse keeper to be configured. Do you have it? https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper/

from qryn.

smmstf avatar smmstf commented on May 9, 2024

Yes i already have it configured, here's my config file:

<?xml version="1.0" ?>
<yandex>
    <keeper_server>
       <!--  <tcp_port_secure>2181</tcp_port_secure> -->
        <tcp_port>2181</tcp_port>
        <server_id>1</server_id>
        <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
        <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>

        <coordination_settings>
            <operation_timeout_ms>10000</operation_timeout_ms>
            <session_timeout_ms>30000</session_timeout_ms>
            <raft_logs_level>trace</raft_logs_level>
              <rotate_log_storage_interval>10000</rotate_log_storage_interval>
        </coordination_settings>

      <raft_configuration>
            <server>
               <id>1</id>
                 <hostname>172.30.102.243</hostname>
               <port>9444</port>
          </server>
          <server>
               <id>2</id>
                 <hostname>172.30.103.104</hostname>
               <port>9444</port>
          </server>
          <server>
               <id>3</id>
                 <hostname>172.30.103.105</hostname>
               <port>9444</port>
          </server>
      </raft_configuration>
    </keeper_server>

    <zookeeper>
        <node>
            <host>172.30.102.243</host>
            <port>2181</port>
            <!-- <secure>1</secure> -->
       </node>
        <node>
            <host>172.30.103.104</host>
            <port>2181</port>
            <!-- <secure>1</secure> -->
        </node>
        <node>
            <host>172.30.103.105</host>
            <port>2181</port>
            <!-- <secure>1</secure> -->
        </node>
    </zookeeper>
<!--
    <distributed_ddl>
        <path>/clickhouse/cloki_cluster/task_queue/ddl</path>
    </distributed_ddl>
-->
</yandex>

And it works correctly

from qryn.

akvlad avatar akvlad commented on May 9, 2024

Ok. I reproduced the bug.
The configuration of clickhouse / keeper cluster is unbearably complicated conteining completely non-obvious steps. So you should recheck everything carefully. Here are steps I have spent a lot of time to figure out.

  1. (pretty obvious) all servers should freely communicate through ports 2181 and 9444. So you can check if both ports are reachable with telnet.
  2. If you use hostnames instead of IPs like that (I have managed to configure with hostnames only):
<zookeeper>
        <node>
            <host>clickhouse_1</host>
            <port>2181</port>
            <secure>0</secure>
       </node>
        <node>
            <host>clickhouse_2</host>
            <port>2181</port>
            <secure>0</secure>
        </node>
    </zookeeper>

Then you should provide hostname.hostname alias for all the nodes. I provided clickhouse_1.clickhouse_1 and clickhouse_2.clickhouse_2. I have no idea why they need it but anyway.
3. <secure>0</secure> is mandatory for each node of clickhouse keeper. Or you should provide an rsa key pair https://clickhouse.com/docs/en/operations/ssl-zookeeper/
4. Macros and default paths. For me the next config worked

    <macros>
        <shard>01</shard>
        <replica>example01-01-1</replica>
    </macros>
    <default_replica_path>/clickhouse/tables/{shard}/{database}/{table}</default_replica_path>
    <default_replica_name>{replica}</default_replica_name>

<replica>example01-01-1</replica> part should wary between servers.
5. Pretty simple check to ensure if all the keepers are configured and work ok in clickhouse client:

set insert_quorum=3
INSERT INTO samples_v3 (fingerprint, timestamp_ns, string) VALUES (1, 1656583893000000000, 'str');
  1. And of course if clickhouse keeper doesn't work it may write some errors in /var/log/clickhouse-server/clickhouse-server.log. But that's optional. It may be absent silently. :)
  2. Clickhouse version. I have tested with this: https://hub.docker.com/layers/clickhouse-server/clickhouse/clickhouse-server/22.6.2.12/images/sha256-23f45eddbd72befa0942660b31f76b3b1aa45b66512dc19ab173c76615528ae0?context=explore Make sure you are upgraded to the latest version

Clickhouse keeper & Replicated* table engine features are very rare. Let's hope they will become a bit less magical in future.

from qryn.

akvlad avatar akvlad commented on May 9, 2024

@coelho Doesn't it show some errors about subrequests in distributed tables? Or maybe distributed table requests inside CTE? Did you change something in the clickhouse config.xml file?

from qryn.

coelho avatar coelho commented on May 9, 2024

@akvlad Completely forgot to include that. You're right - I edited my gist.

+ // NOTE: You also need to set "distributed_product_mode" to "global" in your profile.
+ // https://clickhouse.com/docs/en/operations/settings/settings-profiles/

from qryn.

akvlad avatar akvlad commented on May 9, 2024

@coelho thanks for the gist. I think it can be included into our wiki as a recipe for clustering.

from qryn.

ktpktr0 avatar ktpktr0 commented on May 9, 2024

@coelho I created a Clickhouse cluster with 3 slices and 2 copies, and used "https://clickhouse.com/docs/en/operations/settings/settings-profiles/" The database and data table are created, but qryn doesn't work normally, prompting an error of 500

from qryn.

coelho avatar coelho commented on May 9, 2024

@ktpktr0 Can you show log output + test the tables in your Clickhouse?
for ex. for each table: SELECT count() FROM time_series

from qryn.

ktpktr0 avatar ktpktr0 commented on May 9, 2024

vector_aggregate log:

2022-08-01T13:49:44.710447Z  INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info,buffers=info,kube=info"
2022-08-01T13:49:44.710517Z  INFO vector::app: Loading configs. paths=["/etc/vector/vector_aggregate1.toml"]
2022-08-01T13:49:44.723816Z  INFO vector::topology::running: Running healthchecks.
2022-08-01T13:49:44.724084Z  INFO vector: Vector has started. debug="false" version="0.23.0" arch="x86_64" build_id="38c2435 2022-07-11"
2022-08-01T13:49:44.724097Z  INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
2022-08-01T13:49:44.750187Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=A non-successful status returned: 500 Internal Server Error component_kind="sink" component_type="loki" component_id=out component_name=out

qryn log:

{"level":30,"time":1659361774106,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"Initializing DB... cloki"}
(node:55684) [FST_MODULE_DEP_FASTIFY-WEBSOCKET] FastifyWarning.fastify-websocket: fastify-websocket has been deprecated. Use @fastify/[email protected] instead.
(Use `node --trace-warnings ...` to show where the warning was created)
(node:55684) [FST_MODULE_DEP_FASTIFY-CORS] FastifyWarning.fastify-cors: fastify-cors has been deprecated. Use @fastify/[email protected] instead.
(node:55684) [FST_MODULE_DEP_FASTIFY-STATIC] FastifyWarning.fastify-static: fastify-static has been deprecated. Use @fastify/[email protected] instead.
{"level":30,"time":1659361774385,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"Server listening at http://0.0.0.0:3100"}
{"level":30,"time":1659361774385,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"Qryn API up"}
{"level":30,"time":1659361774385,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"Qryn API listening on http://0.0.0.0:3100"}
{"level":30,"time":1659361774396,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"xxh ready"}
{"level":30,"time":1659361774407,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"Checking clickhouse capabilities"}
{"level":30,"time":1659361774410,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"LIVE VIEW: supported"}
{"level":30,"time":1659361774416,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"checking old samples support: samples_v2"}
{"level":30,"time":1659361774422,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"checking old samples support: samples"}
{"level":30,"time":1659361774486,"pid":55684,"hostname":"k8s-master2","name":"qryn","msg":"xxh ready"}
{"level":30,"time":1659361783575,"pid":55684,"hostname":"k8s-master2","name":"qryn","reqId":"req-1","req":{"method":"GET","url":"/ready","hostname":"cloki","remoteAddress":"192.168.10.160","remotePort":34850},"msg":"incoming request"}
{"level":50,"time":1659361783585,"pid":55684,"hostname":"k8s-master2","name":"qryn","reqId":"req-1","err":"Clickhouse DB not ready\nError: Clickhouse DB not ready\n    at Object.handler (/usr/lib/node_modules/qryn/lib/handlers/ready.js:15:14)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)","msg":"Clickhouse DB not ready"}
{"level":30,"time":1659361783587,"pid":55684,"hostname":"k8s-master2","name":"qryn","reqId":"req-1","res":{"statusCode":500},"responseTime":10.706425994634628,"msg":"request completed"}

clickhouse server log:

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xba37dda in /usr/bin/clickhouse
1. DB::readException(DB::ReadBuffer&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool) @ 0xbab199e in /usr/bin/clickhouse
2. DB::Connection::receiveException() const @ 0x17854161 in /usr/bin/clickhouse
3. DB::Connection::receiveHello() @ 0x17853b8c in /usr/bin/clickhouse
4. DB::Connection::connect(DB::ConnectionTimeouts const&) @ 0x1785247a in /usr/bin/clickhouse
5. DB::Connection::forceConnected(DB::ConnectionTimeouts const&) @ 0x17854e44 in /usr/bin/clickhouse
6. DB::ConnectionEstablisher::run(PoolWithFailoverBase<DB::IConnectionPool>::TryResult&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&) @ 0x1787508b in /usr/bin/clickhouse
7. ? @ 0x1786fad6 in /usr/bin/clickhouse
8. PoolWithFailoverBase<DB::IConnectionPool>::getMany(unsigned long, unsigned long, unsigned long, unsigned long, bool, std::__1::function<PoolWithFailoverBase<DB::IConnectionPool>::TryResult (DB::IConnectionPool&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&)> const&, std::__1::function<unsigned long (unsigned long)> const&) @ 0x1786dd13 in /usr/bin/clickhouse
9. PoolWithFailoverBase<DB::IConnectionPool>::get(unsigned long, bool, std::__1::function<PoolWithFailoverBase<DB::IConnectionPool>::TryResult (DB::IConnectionPool&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&)> const&, std::__1::function<unsigned long (unsigned long)> const&) @ 0x1786c45f in /usr/bin/clickhouse
10. DB::ConnectionPoolWithFailover::get(DB::ConnectionTimeouts const&, DB::Settings const*, bool) @ 0x1786c261 in /usr/bin/clickhouse
11. DB::StorageDistributedDirectoryMonitor::processFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x1747a84e in /usr/bin/clickhouse
12. DB::StorageDistributedDirectoryMonitor::run() @ 0x17475c28 in /usr/bin/clickhouse
13. DB::BackgroundSchedulePoolTaskInfo::execute() @ 0x15e29f7d in /usr/bin/clickhouse
14. DB::BackgroundSchedulePool::threadFunction() @ 0x15e2d076 in /usr/bin/clickhouse
15. ? @ 0x15e2df2e in /usr/bin/clickhouse
16. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xbb046a8 in /usr/bin/clickhouse
17. ? @ 0xbb07a3d in /usr/bin/clickhouse
18. start_thread @ 0x81cf in /usr/lib64/libpthread-2.28.so
19. __GI___clone @ 0x39d83 in /usr/lib64/libc-2.28.so
 (version 22.7.1.2484 (official build))
2022.08.01 21:52:37.141560 [ 10999 ] {} <Debug> DNSResolver: Updating DNS cache
2022.08.01 21:52:37.141614 [ 10999 ] {} <Debug> DNSResolver: Updated DNS cache

from qryn.

coelho avatar coelho commented on May 9, 2024

@ktpktr0 Clickhouse log is incomplete, but it sounds like your Clickhouse cluster / replication just isn't setup properly.
You should validate your Clickhouse with manual SQL queries (ex. confirm Distributed/Replicated tables are working properly)

from qryn.

ktpktr0 avatar ktpktr0 commented on May 9, 2024

I show my cluster configuration here ClickHouse/ClickHouse#39767

from qryn.

coelho avatar coelho commented on May 9, 2024

@ktpktr0 You need to manually run SQL queries on the tables to debug the error.

Anyway - This is likely not the place to debug a Clickhouse install.

from qryn.

ktpktr0 avatar ktpktr0 commented on May 9, 2024

I tried the statement here, and it works normally #188

from qryn.

coelho avatar coelho commented on May 9, 2024

@ktpktr0 #188 is a broken table schema with the current qryn as you'll have incomplete data.

from qryn.

ktpktr0 avatar ktpktr0 commented on May 9, 2024

I am a novice of Clickhouse and qryn. I want to put each component cluster mode into the production environment.

from qryn.

ktpktr0 avatar ktpktr0 commented on May 9, 2024

@coelho Can you share the configuration of Clickhouse? Thank you

from qryn.

lmangani avatar lmangani commented on May 9, 2024

Thanks @coelho for all the contributed help. Converging the topic here: https://github.com/metrico/qryn/wiki/qryn-tables-replication-support

from qryn.

raiford avatar raiford commented on May 9, 2024

@coelho apologies if this is not the best place to ask, but I am curious why you chose to use the fingerprint as the sharding key instead of something like rand()? While experimenting with the schema I noticed my shards were getting unbalanced because of one type of logs that all have identical labels.

For anyone needing Sharding + Replication, I adjusted the schema and it works perfectly: https://gist.github.com/coelho/c3b7bbb2c95caa61115d93692f9e4ae2

@lmangani Feel free to use as reference to add support :)

from qryn.

lmangani avatar lmangani commented on May 9, 2024

@raiford thanks for this interesting point, please feel free to submit any suggestion or open an issue dedicated to this topic, ideally providing everyone with the best options by default

from qryn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.