GithubHelp home page GithubHelp logo

Comments (13)

kbr-scylla avatar kbr-scylla commented on May 31, 2024 1

@rohitraj-carousellgroup

please do the following:

from scylladb.

kbr-scylla avatar kbr-scylla commented on May 31, 2024 1

@rohitraj-carousellgroup please provide the output of

select * from system.raft_state

executed on every node in the cluster

from scylladb.

kbr-scylla avatar kbr-scylla commented on May 31, 2024 1

Manual Raft recovery procedure will not affect availability of queries, assuming that:

  • you're using RF >= 3
  • you're using CL <= QUORUM
  • you never shutdown more than 1 replica at a time (so all your restarts should be rolling restarts)

Usually the procedure is performed when you have a dead node, and you have to shutdown other nodes too so you have 2 nodes down at a time. But in your case there are no dead nodes so you will only have 1 node down at a time if you do rolling restarts.

from scylladb.

mykaul avatar mykaul commented on May 31, 2024

@kbr-scylla - is that related to Raft?

from scylladb.

kbr-scylla avatar kbr-scylla commented on May 31, 2024

No idea.
Please provide full log from node4 as a file.

from scylladb.

mykaul avatar mykaul commented on May 31, 2024

@rohitraj-carousellgroup - can you please add logs?

from scylladb.

rohitraj-carousellgroup avatar rohitraj-carousellgroup commented on May 31, 2024

I will share logs in sometime.

from scylladb.

rohitraj-carousellgroup avatar rohitraj-carousellgroup commented on May 31, 2024

I am sharing the logs for node-4
logfile.txt

from scylladb.

rohitraj-carousellgroup avatar rohitraj-carousellgroup commented on May 31, 2024

We need one more help
We are able to add the cluster in our scylla db after upgradation
Scylla Version : 5.2.15
Currently our setup has 7 nodes
3 nodes in rack a
3 nodes in rack b
1 node in rack c
Replication Factor : 3
We want to do decommission the node which is currently in rack-c
While running nodetool decommission command in rack-c, we are getting this error

Apr 22 14:19:02 scylla-data-03 scylla[381072]:  [shard 0] storage_service - stream_ranges successful
Apr 22 14:19:02 scylla-data-03 scylla[381072]:  [shard 0] storage_service - DECOMMISSIONING: unbootstrap done
Apr 22 14:19:02 scylla-data-03 scylla[381072]:  [shard 0] storage_service - decommission[a581af49-0d71-4a74-8e88-5d17d6238053]: becoming a group 0 non-voter
Apr 22 14:19:02 scylla-data-03 scylla[381072]:  [shard 0] raft_group0 - becoming a non-voter (my id = 5dff8f6d-6761-46b6-b209-76ec17acb5cf)...
Apr 22 14:19:02 scylla-data-03 scylla[381072]:  [shard 0] storage_service - decommission[a581af49-0d71-4a74-8e88-5d17d6238053]: Operation failed, sync_nodes={10.xx.xx.xx, 10.xx.xx.xx, 10.xx.xx.xx, 10.xx.xx.xx, 10.xx.xx.xx, 10.xx.xx.xx, 10.xx.xx.xx}: std::invalid_argument (The configuration must have at least one voter)
Apr 22 14:19:02 scylla-data-03 scylla[381072]:  [shard 0] storage_service - decommission[a581af49-0d71-4a74-8e88-5d17d6238053]: Stopped heartbeat_updater
Apr 22 14:19:02 scylla-data-03 scylla[381072]:  [shard 0] storage_service - decommission[a581af49-0d71-4a74-8e88-5d17d6238053]: Started decommission_abort[a581af49-0d71-4a74-8e88-5d17d6238053]: ignore_nodes=[], leaving_nodes=[10.240.100.2], replace_nodes={}, bootstrap_nodes={}, repair_tables=[]

from scylladb.

rohitraj-carousellgroup avatar rohitraj-carousellgroup commented on May 31, 2024

node 1
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | 555735d1-9ede-4b5c-b100-c2bffb8cafc2 | False
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | 724cc1e2-b777-4d06-94de-83766e27cb19 | True
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | a6181654-0163-44f6-ad07-74d87b80a828 | True
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | af5a7f50-f74d-4467-a141-d8f5d46eca6f | True

node 2
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 1525fe2c-73b6-4809-8489-e66eb2919628 | True
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 7a57bec2-e2e8-4172-9a40-7b9ac5d9b64e | True
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 985d1aa2-71bc-4592-aef6-ec65cb21ff7f | True

node 3
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
130367f0-1b09-11ee-9994-f71da009916d | CURRENT | 5dff8f6d-6761-46b6-b209-76ec17acb5cf | True

node 4
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | 555735d1-9ede-4b5c-b100-c2bffb8cafc2 | False
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | 724cc1e2-b777-4d06-94de-83766e27cb19 | True
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | a6181654-0163-44f6-ad07-74d87b80a828 | True
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | af5a7f50-f74d-4467-a141-d8f5d46eca6f | True

node 5
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | 555735d1-9ede-4b5c-b100-c2bffb8cafc2 | False
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | 724cc1e2-b777-4d06-94de-83766e27cb19 | True
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | a6181654-0163-44f6-ad07-74d87b80a828 | True
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | af5a7f50-f74d-4467-a141-d8f5d46eca6f | True

node 6
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 1525fe2c-73b6-4809-8489-e66eb2919628 | True
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 7a57bec2-e2e8-4172-9a40-7b9ac5d9b64e | True
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 985d1aa2-71bc-4592-aef6-ec65cb21ff7f | True

node 7
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 1525fe2c-73b6-4809-8489-e66eb2919628 | True
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 7a57bec2-e2e8-4172-9a40-7b9ac5d9b64e | True
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 985d1aa2-71bc-4592-aef6-ec65cb21ff7f | True

We are trying to decommission node 3

from scylladb.

kbr-scylla avatar kbr-scylla commented on May 31, 2024

@rohitraj-carousellgroup you've got 3 separate clusters in there, connected into one. You can see that by group_id, 3 different values are appearing there. There should be only one group_id in correctly setup cluster.

I suspect you bootstrapped the nodes using incorrect seeds configuration such that the nodes formed 3 independent clusters (perhaps separate cluster in each rack?). Then you restarted the nodes with different seeds configuration so that the nodes started gossiping with each other... and now you have this weird configuration with 3 different clusters pretending to be one :)

When bootstrapping nodes make sure to follow the correct documented procedure.

Similar to #14446

Anyway, the only way I currently see to fix your cluster is to perform the manual Raft recovery procedure: https://opensource.docs.scylladb.com/branch-5.2/architecture/raft.html#raft-manual-recovery-procedure

from scylladb.

kbr-scylla avatar kbr-scylla commented on May 31, 2024

Anyway, the only way I currently see to fix your cluster is to perform the manual Raft recovery procedure:

In your special case there are no dead nodes, so you skip step 4.

from scylladb.

rohitraj-carousellgroup avatar rohitraj-carousellgroup commented on May 31, 2024

@kbr-scylla , what will be the safest way,

  1. should we create a fresh cluster, dump the data and switch the traffic
  2. should we perform manual raft recovery procedure and it will not effect live read/write queries in scylla db.

from scylladb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.