Comments (26)
thank you for pointing out this. fixed.
from scylladb.
from scylladb.
The issue description above got garbled due to missing `
from scylladb.
test_simple_decommission_node_while_query_info
this time. see https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/6374/testReport/junit/update_cluster_layout_tests/TestUpdateClusterLayout/Tests___dtest___test_simple_decommission_node_while_query_info_2_/
from scylladb.
self = <update_cluster_layout_tests.TestUpdateClusterLayout object at 0x7f02a482d810>
rf = 2
@pytest.mark.required_features("!consistent-topology-changes")
@pytest.mark.parametrize("rf", [1, 2])
def test_simple_decommission_node_while_query_info(self, rf):
"""
Test decommissioning node streams all data
1. Create a cluster with a three nodes with rf, insert data
2. Decommission node, while node is decommissioning query data
3. Check that the cluster returns all
"""
cluster = self.cluster
consistency = {1: ConsistencyLevel.ONE, 2: ConsistencyLevel.TWO}[rf]
# Disable hinted handoff and set batch commit log so this doesn't
# interfer with the test (this must be after the populate)
cluster.set_configuration_options(values=self.default_config_options(), batch_commitlog=True)
cluster.populate(3).start()
node1, node2, node3 = cluster.nodelist()
session = self.patient_cql_connection(node1)
create_ks(session, "ks", rf)
create_cf(session, "cf", read_repair=0.0, columns={"c1": "text", "c2": "text"})
num_keys = 2000
insert_c1c2(session, keys=range(num_keys), consistency=consistency)
def run(stop_run):
logger.debug("Background SELECT loop starting")
query = session.prepare("SELECT * FROM cf")
query.consistency_level = consistency
while not stop_run.is_set():
result = list(session.execute(query))
assert len(result) == num_keys
time.sleep(0.01)
logger.debug("Background SELECT loop done")
executor = ThreadPoolExecutor(max_workers=1)
stop_run = Event()
t = executor.submit(run, stop_run)
logger.debug("Decommissioning node2")
node2.decommission()
stop_run.set()
> t.result()
update_cluster_layout_tests.py:1196:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python3.11/concurrent/futures/_base.py:456: in result
return self.__get_result()
/usr/lib64/python3.11/concurrent/futures/_base.py:401: in __get_result
raise self._exception
/usr/lib64/python3.11/concurrent/futures/thread.py:58: in run
result = self.fn(*self.args, **self.kwargs)
update_cluster_layout_tests.py:1183: in run
result = list(session.execute(query))
cassandra/cluster.py:2726: in cassandra.cluster.Session.execute
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E cassandra.OperationTimedOut: errors={'127.0.73.1:9042': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=127.0.73.1:9042
cassandra/cluster.py:5085: OperationTimedOut
from scylladb.
history show this issue with test_simple_kill_node_while_decommissioning, is happening since September:
https://70f106c98484448dbc4705050eb3f7e9.us-east-1.aws.found.io:9243/goto/6a77bc60-c5d2-11ee-81c7-3986d18dafd5
test_simple_decommission_node_while_query_info was refactor recently in https://github.com/scylladb/scylla-dtest/pull/3814 by @bhalevy, so it's probably something new as part of that.
from scylladb.
Also seen in #16927.
from scylladb.
from scylladb.
@xemul since you're looking into issues with decommissioning (from the tablets path), can you please look into this issue too?
It's hurting us in CI mostly, but it we need to understand the root cause and see how serious it is.
from scylladb.
I think it might be related to change in https://github.com/scylladb/scylla-dtest/pull/3814
The specific test was doing only 100 requests in the background thread
You changed to keep on going until told to stop.
It is known that during charging topology we can get client side errors (in c-s we have default retry of 10 times), so I can think we need better retries in this case.
Python driver doesn't supply very good machinery for it, I'm trying to improve it in:
scylladb/python-driver#298
Meanwhile I think we should add retries in the test code itself
from scylladb.
Why is it ok to see client-side errors during topology changes?
We need to fix that.
Cc @avelanarius
from scylladb.
Raising to P0.
from scylladb.
from scylladb.
https://github.com/scylladb/scylla-dtest/pull/3985 doesn't help, it still happens rarely
from scylladb.
Driver complains
E cassandra.OperationTimedOut: errors={'127.0.79.1:9042': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=127.0.79.1:9042
In node-1 logs I have
20:05:30,608 [shard 0:strm] storage_service - decommission[077e2627-bf0d-41bd-af6c-dee41073f850]: Added node=127.0.79.2 as leaving node, coordinator=127.0.79.2
20:05:43,588 [shard 0:stmt] cql_server - Processing EXECUTE request from 127.0.0.1:52688
20:05:43,612 [shard 0: gms] gossip - Removed endpoint 127.0.79.2
20:06:43,589 [shard 0:stmt] cql_server - Done processing EXECUTE request from 127.0.0.1:52688
20:06:43,589 [shard 0:stmt] cql_server - 127.0.0.1:52688: request resulted in read_timeout_error, stream 92, code 4608, message [Operation timed out for ks.cf - received only 1 responses from 2 CL=TWO.]
from scylladb.
Driver complains
E cassandra.OperationTimedOut: errors={'127.0.79.1:9042': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=127.0.79.1:9042
In node-1 logs I have
20:05:30,608 [shard 0:strm] storage_service - decommission[077e2627-bf0d-41bd-af6c-dee41073f850]: Added node=127.0.79.2 as leaving node, coordinator=127.0.79.2 20:05:43,588 [shard 0:stmt] cql_server - Processing EXECUTE request from 127.0.0.1:52688 20:05:43,612 [shard 0: gms] gossip - Removed endpoint 127.0.79.2 20:06:43,589 [shard 0:stmt] cql_server - Done processing EXECUTE request from 127.0.0.1:52688 20:06:43,589 [shard 0:stmt] cql_server - 127.0.0.1:52688: request resulted in read_timeout_error, stream 92, code 4608, message [Operation timed out for ks.cf - received only 1 responses from 2 CL=TWO.]
So what's holding this query for that long ?
from scylladb.
So what's holding this query for that long ?
This is what I'm currently investigating
from scylladb.
Apparently one of the replicas that coordinator tried to query was the decommissioned one and it didn't respond
from scylladb.
node1 logs:
09:55:20,504 [shard 0:stmt] cql_server - Processing EXECUTE 0/386 request from 127.0.0.1:60216
09:55:20,530 [shard 0: gms] seastar - stopped client socket from 127.0.85.1:65066 to 127.0.85.2:7000
09:55:20,530 [shard 0:stmt] seastar - stopped client socket from 127.0.85.1:65530 to 127.0.85.2:7000
09:55:20,530 [shard 0:main] seastar - stopped client socket from 127.0.85.1:52920 to 127.0.85.2:7000
09:55:20,530 [shard 1:stmt] seastar - stopped client socket from 127.0.85.1:49867 to 127.0.85.2:7000
09:55:20,530 [shard 1:stmt] seastar - stopped client socket from 127.0.85.1:53841 to 127.0.85.2:7000
09:55:20,530 [shard 1:strm] seastar - stopped client socket from 127.0.85.1:63599 to 127.0.85.2:7000
09:55:20,530 [shard 0:strm] seastar - stopped client socket from 127.0.85.1:61294 to 127.0.85.2:7000
09:55:30,317 [shard 0: gms] seastar - stopped client socket from 127.0.85.1:59698 to 127.0.85.2:7000
09:55:30,317 [shard 0:main] seastar - stopped client socket from 127.0.85.1:58596 to 127.0.85.2:7000
09:55:30,317 [shard 1:strm] seastar - stopped client socket from 127.0.85.1:61301 to 127.0.85.2:7000
09:55:30,317 [shard 0:strm] seastar - stopped client socket from 127.0.85.1:57678 to 127.0.85.2:7000
09:56:20,505 [shard 0:stmt] cql_server - Done processing EXECUTE 1/389 request from 127.0.0.1:60216
09:56:20,505 [shard 0:stmt] cql_server - 127.0.0.1:60216: request resulted in read_timeout_error, stream 51, code 4608, message [Operation timed out for ks.cf - received only 1 responses from 2 CL=TWO.]
node2 logs:
09:55:08,629 [shard 0:strm] api - decommission
09:55:20,530 [shard 0:strm] seastar - stopped server socket from 127.0.85.1:65066
09:55:20,530 [shard 1:strm] seastar - stopped server socket from 127.0.85.1:63599
09:55:20,530 [shard 1:strm] seastar - stopped server socket from 127.0.85.1:49867
09:55:20,530 [shard 1:strm] seastar - stopped server socket from 127.0.85.1:53841
09:55:20,530 [shard 0:strm] seastar - stopped server socket from 127.0.85.1:52920
09:55:20,530 [shard 0:strm] seastar - stopped server socket from 127.0.85.1:61294
09:55:20,530 [shard 0:strm] seastar - stopped server socket from 127.0.85.1:65530
09:55:30,145 [shard 0:strm] storage_service - Stop transport: starts
09:55:30,146 [shard 0:strm] storage_service - Stop transport: shutdown rpc and cql server done
09:55:30,316 [shard 0:strm] storage_service - Stop transport: stop_gossiping done
09:55:30,317 [shard 0:strm] seastar - stopped server socket from 127.0.85.1:59698
09:55:30,317 [shard 0:strm] seastar - stopped server socket from 127.0.85.1:57678
09:55:30,317 [shard 0:strm] seastar - stopped server socket from 127.0.85.1:58596
09:55:30,317 [shard 1:strm] seastar - stopped server socket from 127.0.85.1:61301
09:55:30,317 [shard 0:strm] storage_service - Stop transport: shutdown messaging_service done
09:55:30,317 [shard 0:strm] storage_service - Stop transport: shutdown stream_manager done
09:55:30,317 [shard 0:strm] storage_service - Stop transport: done
Summary:
09:55:08,629 node2 decommission starts
09:55:20,504 node1 query starts
09:55:20,530 node2 closes some connections from node1
09:55:30,145 node2 starts stopping transport (after unbootstrapping)
09:55:30,317 node2 closes remaining connections from node1
09:56:20,505 node1 times-out processing query
from scylladb.
11:22:15,843 [shard 1:stmt] cql_server - Processing EXECUTE 0/390 request from 127.0.0.1:42062
11:22:15,843 [shard 1:stmt] rpc - send READ_DATA to 127.0.67.2:0 // multiple
11:22:15,846 [shard 1:stmt] rpc - send READ_DIGEST to 127.0.67.2:0 // multiple
11:22:15,859 [shard 0: gms] gossip - Node 127.0.67.2 will be removed from gossip at [2024-03-08 11:22:14]: (expire = 1709886134996526720, now = 1709626935859056195, diff = 259199 seconds)
11:22:15,859 [shard 0: gms] storage_service - Removing tokens {...} for 127.0.67.2
11:22:15,863 [shard 1:stmt] rpc - send READ_DATA to 127.0.67.2:0 // multiple
11:22:15,863 [shard 1:stmt] rpc - send READ_DIGEST to 127.0.67.2:0 // multiple
11:22:15,865 [shard 0: gms] gossip - Removed endpoint 127.0.67.2
11:22:15,865 [shard 0: gms] gossip - InetAddress 127.0.67.2 is now DOWN, status = LEFT
11:22:15,865 [shard 0:strm] seastar - stopped client socket from 127.0.67.1:60194 to 127.0.67.2:7000
11:22:15,865 [shard 0:main] seastar - stopped client socket from 127.0.67.1:55812 to 127.0.67.2:7000
11:22:15,865 [shard 0:stmt] seastar - stopped client socket from 127.0.67.1:63380 to 127.0.67.2:7000
11:22:15,865 [shard 0: gms] seastar - stopped client socket from 127.0.67.1:50892 to 127.0.67.2:7000
11:22:15,865 [shard 1:stmt] seastar - stopped client socket from 127.0.67.1:54689 to 127.0.67.2:7000
11:22:15,866 [shard 1:stmt] rpc - message 3 to 127.0.67.2:0 failed with seastar::rpc::closed_error (connection is closed) // multiple
11:22:15,866 [shard 1:stmt] rpc - message 5 to 127.0.67.2:0 failed with seastar::rpc::closed_error (connection is closed) // multiple
^^^ this is the last message about RPC failure to node 67.2
11:22:15,866 [shard 1:strm] seastar - stopped client socket from 127.0.67.1:62397 to 127.0.67.2:7000
11:22:15,866 [shard 1:stmt] seastar - stopped client socket from 127.0.67.1:50003 to 127.0.67.2:7000
11:22:16,858 [shard 1:strm] rpc - send GOSSIP_ECHO to 127.0.67.2:0
+10 sec
11:22:25,003 [shard 0:strm] storage_service - decommission[6e00cf02-ef34-4163-8065-23f0ca571e20]: Started to check if nodes={127.0.67.2} have left the cluster, coordinator=127.0.67.2
11:22:25,003 [shard 0:strm] storage_service - decommission[6e00cf02-ef34-4163-8065-23f0ca571e20]: Finished to check if nodes={127.0.67.2} have left the cluster, coordinator=127.0.67.2
11:22:25,003 [shard 0:strm] storage_service - decommission[6e00cf02-ef34-4163-8065-23f0ca571e20]: Marked ops done from coordinator=127.0.67.2
11:22:25,008 [shard 0:strm] rpc - send RAFT_APPEND_ENTRIES to 127.0.67.2:0
11:22:25,644 [shard 0:strm] seastar - stopped server socket from 127.0.67.2:61160
11:22:25,644 [shard 0:strm] seastar - stopped server socket from 127.0.67.2:60168
11:22:25,644 [shard 0:strm] seastar - stopped server socket from 127.0.67.2:50518
11:22:25,644 [shard 0: gms] seastar - stopped client socket from 127.0.67.1:51680 to 127.0.67.2:7000
11:22:25,644 [shard 0:strm] seastar - stopped server socket from 127.0.67.2:51434
11:22:25,644 [shard 0:strm] seastar - stopped server socket from 127.0.67.2:65474
11:22:25,644 [shard 0:main] seastar - stopped client socket from 127.0.67.1:63482 to 127.0.67.2:7000
11:22:25,644 [shard 0:strm] seastar - stopped client socket from 127.0.67.1:59708 to 127.0.67.2:7000
11:22:25,644 [shard 1:strm] seastar - stopped server socket from 127.0.67.2:61331
11:22:25,644 [shard 1:strm] seastar - stopped server socket from 127.0.67.2:64683
11:22:25,644 [shard 1:strm] seastar - stopped client socket from 127.0.67.1:60679 to 127.0.67.2:7000
11:22:25,644 [shard 1:strm] seastar - stopped server socket from 127.0.67.2:61417
11:22:25,644 [shard 1:strm] seastar - stopped server socket from 127.0.67.2:52147
+50 sec
11:23:15,843 [shard 1:stmt] cql_server - Done processing EXECUTE 1/391 request from 127.0.0.1:42062
11:23:15,843 [shard 1:stmt] cql_server - 127.0.0.1:42062: request resulted in read_timeout_error, stream 94, code 4608, message [Operation timed out for ks.cf - received only 1 responses from 2 CL=TWO.]
11:23:15,844 [shard 1:stmt] cql_server - Advertising disconnection of CQL client 127.0.0.1:42062
despite node2 failed all READ_DATA/READ_DIGEST requests from node1, it took node1 1 more minute to time-out its reading
from scylladb.
Timeout exception is raised from digest_read_resolver::on_timeout()
from scylladb.
Closed connection from decommissioned node don't fail the request instantly because of
void on_error(gms::inet_address ep, error_kind kind) override {
if (waiting_for(ep)) {
_failed++;
}
if (kind == error_kind::DISCONNECT && _block_for == _target_count_for_cl) {
// if the error is because of a connection disconnect and there is no targets to speculate
// wait for timeout in hope that the client will issue speculative read
// FIXME: resolver should have access to all replicas and try another one in this case
return;
}
_block_for == 2, _target_count_for_cl == 2
from scylladb.
The check comes from 7277ee2 with the description of
After ac27d1c if a read executor has just enough targets to
achieve request's CL and a connection to one of them will be dropped
during execution ReadFailed error will be returned immediately and
client will not have a chance to issue speculative read (retry). The
patch changes the code to not return ReadFailed error immediately, but
wait for timeout instead and give a client chance to issue speculative
read in case read executor does not have additional targets to send
speculative reads to by itself.
@gleb-cloudius , please shed more light on this -- what kind of "speculative read (retry)" is (was?) client supposed to issue? In this test client just waits until timeout and fails the query (and fails the whole test eventually)
from scylladb.
Range read unconditionally creates never_speculating_read_executor passing it 2 replica IPs one of which is 2nd node's. Then gossiper removes 2nd endpoint. Then node2 closes sockets and digest resolver's on_error happens
storage_proxy - creating range read executor for range (-inf, {-9210079157227413570, end}] in table ks.cf with targets {127.0.4.1, 127.0.4.3}
...
storage_proxy - creating range read executor for range ({-9108879658673895196, end},{-9095897187352045003, end}] in table ks.cf with targets {127.0.4.1, 127.0.4.2}
...
rpc - send READ_DIGEST to 127.0.4.3:0
...
rpc - send READ_DIGEST to 127.0.4.2:0
...
gossip - Removed endpoint 127.0.4.2
gossip - InetAddress 127.0.4.2 is now DOWN, status = LEFT
rpc - message READ_DIGEST to 127.0.4.2:0 failed with seastar::rpc::closed_error (connection is closed)
...
storage_proxy - digest read error 1, block_for 2 failed 1 target_count_for_cl 2
from scylladb.
OK, driver-side has configurable (off by default) speculative execution. Turning it ON fixes the issue
from scylladb.
dtest fix merged
from scylladb.
Related Issues (20)
- cql: a crash lurking in ks_prop_defs::get_initial_tablets
- [tablets, MV]: `test_changes_while_node_down`: write failures to view metadata tables during node shutdown HOT 4
- docs: Issue on page Backup your Data - need to use Describe Schema with Internals HOT 4
- test_auth_v2_migration flaky due to auth-v1 inconsistency problem
- Nodetool rebuild failed with 'rebuild failed: streaming failed' with large partitions and partition scans in parallel HOT 5
- docs: Live updateable configuration parameters
- [x86_64, debug] topology_custom/test_mv_topology_change failed with <Task HOT 9
- Docs: Add Ubuntu 24.04 support HOT 1
- nodetool rebuild fails with: service::raft_operation_timeout_error HOT 2
- [RFE] live-update of encryption options
- Invalid experimental feature name in scylla.yaml makes the whole option to be ignored HOT 2
- Deprecated config options are not processed, as if they were Invalid HOT 8
- TWCS reshape may happen unnecessarily when windows are disjoint in token range
- Backport PR with single commit reference to the wrong commit SHA HOT 10
- "sstable not found" error during file-based tablet migration HOT 3
- docs: Issue on page Replace a Running Node in a ScyllaDB Cluster HOT 1
- Scylla io query total bandwidth increase after upgrading to 5.2 HOT 1
- Bundle cqlsh with support for "DESC SCHEMA WITH INTERNALS"
- Adding a secondary index can break an ongoing paged read using another index
- Scylla is not using ME sstables by default
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scylladb.