Comments (14)
What was the limit that was set? What was the impact, besides exceeding the limit?
from scylladb.
Second that question. What was the limit set, what was the configured segment size, and what was the measured disk footprint (also, measured or derived from scylla counters?).
Remember that CL footprint is allowed to exceed the set size/ by one segment size per shard. This because configured size, segment size and number of shards might not add up/align. Thus a little salt should be ingested with measuring actual footprint.
from scylladb.
this check was implementes by @eliransin request:
The commitlog didn't go over the size limit. It is a little more involved, we should make sure that the size of the commitlog directory never exceeds the limit, we can't rely on metrics since if there is a bug in the metrics we will never catch an actual bug where the directory is larger than the stats and the size limit was actually breached.
regarding your questions:
1)no observed impact, all no other errors in the test
- limit set as
commitlog_total_space + 1 segment
scylla_commitlog_disk_total_bytes>=({commitlog_total_space+({commitlog_segment_size_in_mb}*1048576))"
{commitlog_total_space} :
set to max_disk_size(value from API) divided to number for shards
+ {commitlog_segment_size_in_mb}*1048576) :
add size of 1 segment, Scylla actually stops creating new segments not when the next segment
would cause us to exceed the value,
but instead it checks if we've already reached or exceeded the limit.
scylla_commitlog_disk_total_bytes >= :
filter all good values when scylla_commitlog_disk_total_bytes<= commitlog_total_space + 1 segment
and return from Prometheus only bad
from logs limit is : 8914993152+(32*1048576) = 8948547584 but got 54324629504 (x6 times bigger)
Query to PrometheusDB: http://10.0.0.13:9090/api/v1/query_range?query=scylla_commitlog_disk_total_bytes>=(8914993152%2B(32%2A1048576))&start=1706425203.152964&end=1706425803.152964&step=1
bur returner it only anes. and from graphs, you can see that it returned only ones in a moment
Ony one Shard 0 exceed limit :
from scylladb.
2. limit set as
commitlog_total_space + 1 segment
Where does this value come from? What does scylla.yaml say here?
Your example above says the expected value is about 8GB, and the offending one is ~56GB? 8 sounds very low for the type of machines we typically use...
from scylladb.
commitlog_total_space + 1 segment
- this is value for One Shard
in yaml: commitlog_total_space_in_mb: -1
node has :
max_disk_size(from API commitlog/metrics/max_disk_size) = 124809904128
and 14 shards (from seastar-cpu-map.sh)
commitlog_total_space is max_disk_size divided by the number of shards
Calculated values commit log checking thread:
< t:2024-01-27 05:00:03,125 f:commit_log_check_thread.py l:37 c:CommitlogConfigParams p:DEBUG > CommitlogConfigParams
< t:2024-01-27 05:00:03,125 f:commit_log_check_thread.py l:38 c:CommitlogConfigParams p:DEBUG > smp: 14
< t:2024-01-27 05:00:03,125 f:commit_log_check_thread.py l:39 c:CommitlogConfigParams p:DEBUG > max_disk_size: 124809904128
< t:2024-01-27 05:00:03,125 f:commit_log_check_thread.py l:40 c:CommitlogConfigParams p:DEBUG > total_space: 8914993152
< t:2024-01-27 05:00:03,125 f:commit_log_check_thread.py l:41 c:CommitlogConfigParams p:DEBUG > use_hard_size_limit: True
< t:2024-01-27 05:00:03,125 f:commit_log_check_thread.py l:42 c:CommitlogConfigParams p:DEBUG > segment_size_in_mb: 32
from scylladb.
How much memory on the node? 128GB?
128GB max size matches this, and at 14 shards, 9GB limit per shard.
The 8 expected / 56 actual above, is this a shard-local measurement?
And this is all based on metrics, right? Did you get an actual measurement of the commitlog dir size as well?
If I read the metrics screenshot above correctly, the spike is at 05.22, which is 04.22 in the logs. At that time, the node was restarting from a crash, and replayed more or less the full set of logs from previous run. This is all done on shard 0. After this, the replayed segments are all added to shard zero commitlog for either re-use or delete. For a few moments, they will regardless count into the shards disk footprint, until we can determine if they are to be deleted or not. Hence the spike.
Unless I am mistaken here, there is no bug, just the fact that during replay-boot, we don't adhere to the limits fully, esp. not on shard 0.
from scylladb.
RAM (MiB) 131072
is this a shard-local measurement - yes
as I mentioned, I can see any observed impact, all nemesis/stress commands passed, no other errors in the test
and this check was implemented by @eliransin request to catch possible errors.
From my understanding as well , a Hard limit means that it can not exceed the limit
from scylladb.
It can, during/after replay. For a short bit.
from scylladb.
I this case I can set up gate for this. But how short? 30sec, 1min, 5min. in this issue, I see 19sec
from scylladb.
Should be very short, but what you actually see here will also depend very much on the frequency of metrics polling, and any stalls there. Actual deletion of excess files + reduction of the counters will happen with relatively low prio vs. the other startup stuff happening here, i.e. compaction etc will starve it. I would go for a minute or so (excessive) after reply is finished.
from scylladb.
will set 1min them, there is fix proposal scylladb/scylla-cluster-tests#7176
from scylladb.
Should be very short, but what you actually see here will also depend very much on the frequency of metrics polling, and any stalls there. Actual deletion of excess files + reduction of the counters will happen with relatively low prio vs. the other startup stuff happening here, i.e. compaction etc will starve it. I would go for a minute or so (excessive) after reply is finished.
Calle, asking the test for "going for a minute or so" is bad UX, we'll get a flaky test. Can we program a barrier so that even during startup there is a cap (be either time limit, or size limit) on the growth of the commit log.
Another concern is that some clients may perceive this growth as a hard limit (the name says so - a hard limit), so may run out of disk space in these circumstances.
WDYT?
from scylladb.
No one is "asking" tests to wait. But this is an artificial problem caused by over trusting/misusing metrics to measure an (in this instance) unrelated value.
The disk foot print is not "growing" here. The files in question exist on disk, regardless. They are picked up, played back and discarded. Note that the actual size/amount of segments replayed might not be even remotely related to the current sharding/foot print limits. Should these files then be ignored when considering how much footprint we are currently consuming? To what end? Make counters look "better" for this particular test case (this has zero impact on anything real).
The main confusion comes from the full set of segments from previous run being added to shard zero.
The test could probably get by also by just looking at full footprint across all shards, which should be within limits (maybe + one segment extra per shard) until deletion/recycling is done.
This also applies to a customer: We will not start using up any significant real additional disk space before replay + deletion of old stuff is done.
And again: the counters are correct (from a certain point of view). It is the tests assumptions that are wrong. Non-clean restart is a very special case.
from scylladb.
The test is fixed. Too bad the hard limit is not hard, @elcallio
from scylladb.
Related Issues (20)
- tablets: alter keyspace - refine waiting for request to complete
- Materialized Views improvement roadmap
- tablets: alter keyspace - refactor HOT 2
- tablets: alter keyspace - loosen rejection criteria
- Optimal way of adding node in a heavy loaded ScyllaDB cluster HOT 3
- Slow bootstrap of nodes on multi DC cluster due to repair HOT 5
- coredump while Shutting down system distributed keyspace during scylla stop HOT 2
- Memory issues occurred when deploying the scylladb helm chart provided by Bitnami on Container Cloud HOT 7
- test_view_tombstone: AssertionError: Expected [[1, 1, 'b', 3.0]] from SELECT * FROM t_by_v WHERE v = 1, but got [] HOT 2
- TestMaterializedViews.test_base_replica_repair: AssertionError: Expected [[0, 0, 'a', 3.0]] from SELECT * FROM t_by_v WHERE v = 0, but got []
- Tablet migration and view update from staging don't interact well HOT 1
- Raft topology error injections: decommission process stuck while boostrapping node is paused HOT 1
- Unnecessary building of view updates (during RMW) in pending replica HOT 1
- raft: make group0 memory state changes atomic and extensible HOT 1
- [x86_64, dev] topology_custom/test_raft_recovery_stuck failed with NoHostAvailable
- "enable_tablets" option is not documented
- abseil is not built with proper `-O` option
- docs: Issue on page CDC Overview
- docs: set branch-6.0 as the latest version HOT 3
- Changes of live endpoints can delay marking node as DOWN indefinietely HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scylladb.