Comments (13)
please do the following:
- shutdown the node that you tried to add (node 4) and remove its work directories (data, commitlog etc. -- the entire workdir)
- follow the procedure in https://opensource.docs.scylladb.com/branch-5.2/operating-scylla/procedures/cluster-management/handling-membership-change-failures.html to get rid of remains of node 4 from the existing cluster
- upgrade your cluster to the latest 5.2 patch version (I think it's 5.2.15)
- try bootstrapping new node again (but also make sure that it's in latest 5.2. patch version)
from scylladb.
@rohitraj-carousellgroup please provide the output of
select * from system.raft_state
executed on every node in the cluster
from scylladb.
Manual Raft recovery procedure will not affect availability of queries, assuming that:
- you're using RF >= 3
- you're using CL <= QUORUM
- you never shutdown more than 1 replica at a time (so all your restarts should be rolling restarts)
Usually the procedure is performed when you have a dead node, and you have to shutdown other nodes too so you have 2 nodes down at a time. But in your case there are no dead nodes so you will only have 1 node down at a time if you do rolling restarts.
from scylladb.
@kbr-scylla - is that related to Raft?
from scylladb.
No idea.
Please provide full log from node4 as a file.
from scylladb.
@rohitraj-carousellgroup - can you please add logs?
from scylladb.
I will share logs in sometime.
from scylladb.
I am sharing the logs for node-4
logfile.txt
from scylladb.
We need one more help
We are able to add the cluster in our scylla db after upgradation
Scylla Version : 5.2.15
Currently our setup has 7 nodes
3 nodes in rack a
3 nodes in rack b
1 node in rack c
Replication Factor : 3
We want to do decommission the node which is currently in rack-c
While running nodetool decommission command in rack-c, we are getting this error
Apr 22 14:19:02 scylla-data-03 scylla[381072]: [shard 0] storage_service - stream_ranges successful
Apr 22 14:19:02 scylla-data-03 scylla[381072]: [shard 0] storage_service - DECOMMISSIONING: unbootstrap done
Apr 22 14:19:02 scylla-data-03 scylla[381072]: [shard 0] storage_service - decommission[a581af49-0d71-4a74-8e88-5d17d6238053]: becoming a group 0 non-voter
Apr 22 14:19:02 scylla-data-03 scylla[381072]: [shard 0] raft_group0 - becoming a non-voter (my id = 5dff8f6d-6761-46b6-b209-76ec17acb5cf)...
Apr 22 14:19:02 scylla-data-03 scylla[381072]: [shard 0] storage_service - decommission[a581af49-0d71-4a74-8e88-5d17d6238053]: Operation failed, sync_nodes={10.xx.xx.xx, 10.xx.xx.xx, 10.xx.xx.xx, 10.xx.xx.xx, 10.xx.xx.xx, 10.xx.xx.xx, 10.xx.xx.xx}: std::invalid_argument (The configuration must have at least one voter)
Apr 22 14:19:02 scylla-data-03 scylla[381072]: [shard 0] storage_service - decommission[a581af49-0d71-4a74-8e88-5d17d6238053]: Stopped heartbeat_updater
Apr 22 14:19:02 scylla-data-03 scylla[381072]: [shard 0] storage_service - decommission[a581af49-0d71-4a74-8e88-5d17d6238053]: Started decommission_abort[a581af49-0d71-4a74-8e88-5d17d6238053]: ignore_nodes=[], leaving_nodes=[10.240.100.2], replace_nodes={}, bootstrap_nodes={}, repair_tables=[]
from scylladb.
node 1
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | 555735d1-9ede-4b5c-b100-c2bffb8cafc2 | False
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | 724cc1e2-b777-4d06-94de-83766e27cb19 | True
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | a6181654-0163-44f6-ad07-74d87b80a828 | True
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | af5a7f50-f74d-4467-a141-d8f5d46eca6f | True
node 2
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 1525fe2c-73b6-4809-8489-e66eb2919628 | True
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 7a57bec2-e2e8-4172-9a40-7b9ac5d9b64e | True
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 985d1aa2-71bc-4592-aef6-ec65cb21ff7f | True
node 3
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
130367f0-1b09-11ee-9994-f71da009916d | CURRENT | 5dff8f6d-6761-46b6-b209-76ec17acb5cf | True
node 4
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | 555735d1-9ede-4b5c-b100-c2bffb8cafc2 | False
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | 724cc1e2-b777-4d06-94de-83766e27cb19 | True
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | a6181654-0163-44f6-ad07-74d87b80a828 | True
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | af5a7f50-f74d-4467-a141-d8f5d46eca6f | True
node 5
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | 555735d1-9ede-4b5c-b100-c2bffb8cafc2 | False
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | 724cc1e2-b777-4d06-94de-83766e27cb19 | True
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | a6181654-0163-44f6-ad07-74d87b80a828 | True
12d18280-1b09-11ee-a25b-3b6836c2eadb | CURRENT | af5a7f50-f74d-4467-a141-d8f5d46eca6f | True
node 6
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 1525fe2c-73b6-4809-8489-e66eb2919628 | True
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 7a57bec2-e2e8-4172-9a40-7b9ac5d9b64e | True
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 985d1aa2-71bc-4592-aef6-ec65cb21ff7f | True
node 7
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 1525fe2c-73b6-4809-8489-e66eb2919628 | True
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 7a57bec2-e2e8-4172-9a40-7b9ac5d9b64e | True
12e88cf0-1b09-11ee-95b1-29c55cddcfff | CURRENT | 985d1aa2-71bc-4592-aef6-ec65cb21ff7f | True
We are trying to decommission node 3
from scylladb.
@rohitraj-carousellgroup you've got 3 separate clusters in there, connected into one. You can see that by group_id
, 3 different values are appearing there. There should be only one group_id
in correctly setup cluster.
I suspect you bootstrapped the nodes using incorrect seeds
configuration such that the nodes formed 3 independent clusters (perhaps separate cluster in each rack?). Then you restarted the nodes with different seeds
configuration so that the nodes started gossiping with each other... and now you have this weird configuration with 3 different clusters pretending to be one :)
When bootstrapping nodes make sure to follow the correct documented procedure.
Similar to #14446
Anyway, the only way I currently see to fix your cluster is to perform the manual Raft recovery procedure: https://opensource.docs.scylladb.com/branch-5.2/architecture/raft.html#raft-manual-recovery-procedure
from scylladb.
Anyway, the only way I currently see to fix your cluster is to perform the manual Raft recovery procedure:
In your special case there are no dead nodes, so you skip step 4.
from scylladb.
@kbr-scylla , what will be the safest way,
- should we create a fresh cluster, dump the data and switch the traffic
- should we perform manual raft recovery procedure and it will not effect live read/write queries in scylla db.
from scylladb.
Related Issues (20)
- [tablets, MV]: `test_changes_while_node_down`: write failures to view metadata tables during node shutdown HOT 4
- docs: Issue on page Backup your Data - need to use Describe Schema with Internals HOT 4
- test_auth_v2_migration flaky due to auth-v1 inconsistency problem
- Nodetool rebuild failed with 'rebuild failed: streaming failed' with large partitions and partition scans in parallel HOT 5
- docs: Live updateable configuration parameters
- [x86_64, debug] topology_custom/test_mv_topology_change failed with <Task HOT 9
- Docs: Add Ubuntu 24.04 support HOT 1
- nodetool rebuild fails with: service::raft_operation_timeout_error HOT 2
- [RFE] live-update of encryption options
- Invalid experimental feature name in scylla.yaml makes the whole option to be ignored HOT 2
- Deprecated config options are not processed, as if they were Invalid HOT 8
- TWCS reshape may happen unnecessarily when windows are disjoint in token range
- Backport PR with single commit reference to the wrong commit SHA HOT 10
- "sstable not found" error during file-based tablet migration HOT 3
- docs: Issue on page Replace a Running Node in a ScyllaDB Cluster HOT 1
- Scylla io query total bandwidth increase after upgrading to 5.2 HOT 1
- Bundle cqlsh with support for "DESC SCHEMA WITH INTERNALS"
- Adding a secondary index can break an ongoing paged read using another index
- Scylla is not using ME sstables by default HOT 4
- Improve query page stats HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scylladb.