Packages Scylla version: 5.5.0~dev-20240

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

commit log directory exceed the limit during disrupt_decommission_streaming_err nemesis about scylladb HOT 14 CLOSED

temichus commented on June 7, 2024

commit log directory exceed the limit during disrupt_decommission_streaming_err nemesis

from scylladb.

Comments (14)

mykaul commented on June 7, 2024

What was the limit that was set? What was the impact, besides exceeding the limit?

from scylladb.

elcallio commented on June 7, 2024

Second that question. What was the limit set, what was the configured segment size, and what was the measured disk footprint (also, measured or derived from scylla counters?).
Remember that CL footprint is allowed to exceed the set size/ by one segment size per shard. This because configured size, segment size and number of shards might not add up/align. Thus a little salt should be ingested with measuring actual footprint.

from scylladb.

temichus commented on June 7, 2024

@mykaul , @elcallio

this check was implementes by @eliransin request:

The commitlog didn't go over the size limit. It is a little more involved, we should make sure that the size of the commitlog directory never exceeds the limit, we can't rely on metrics since if there is a bug in the metrics we will never catch an actual bug where the directory is larger than the stats and the size limit was actually breached.

regarding your questions:
1)no observed impact, all no other errors in the test

limit set as commitlog_total_space + 1 segment

scylla_commitlog_disk_total_bytes>=({commitlog_total_space+({commitlog_segment_size_in_mb}*1048576))"

{commitlog_total_space} :
        set to max_disk_size(value from API) divided  to number for shards
        
        
+ {commitlog_segment_size_in_mb}*1048576) :
        add size of 1 segment,  Scylla actually stops creating new segments not when the next segment
        would cause us to exceed the value,
        but instead it checks if we've already reached or exceeded the limit.
        
  scylla_commitlog_disk_total_bytes >= :
        filter all good values when scylla_commitlog_disk_total_bytes<= commitlog_total_space + 1 segment
        and return from Prometheus only bad

from logs limit is : 8914993152+(32*1048576) = 8948547584 but got 54324629504 (x6 times bigger)

Query to PrometheusDB: http://10.0.0.13:9090/api/v1/query_range?query=scylla_commitlog_disk_total_bytes>=(8914993152%2B(32%2A1048576))&start=1706425203.152964&end=1706425803.152964&step=1

bur returner it only anes. and from graphs, you can see that it returned only ones in a moment

Ony one Shard 0 exceed limit :

from scylladb.

elcallio commented on June 7, 2024

2. limit set as commitlog_total_space + 1 segment

Where does this value come from? What does scylla.yaml say here?

Your example above says the expected value is about 8GB, and the offending one is ~56GB? 8 sounds very low for the type of machines we typically use...

from scylladb.

temichus commented on June 7, 2024

@elcallio

commitlog_total_space + 1 segment - this is value for One Shard

in yaml: commitlog_total_space_in_mb: -1

node has :
max_disk_size(from API commitlog/metrics/max_disk_size) = 124809904128
and 14 shards (from seastar-cpu-map.sh)

commitlog_total_space is max_disk_size divided by the number of shards

Calculated values commit log checking thread:

< t:2024-01-27 05:00:03,125 f:commit_log_check_thread.py l:37   c:CommitlogConfigParams p:DEBUG > CommitlogConfigParams
< t:2024-01-27 05:00:03,125 f:commit_log_check_thread.py l:38   c:CommitlogConfigParams p:DEBUG > smp: 14
< t:2024-01-27 05:00:03,125 f:commit_log_check_thread.py l:39   c:CommitlogConfigParams p:DEBUG > max_disk_size: 124809904128
< t:2024-01-27 05:00:03,125 f:commit_log_check_thread.py l:40   c:CommitlogConfigParams p:DEBUG > total_space: 8914993152
< t:2024-01-27 05:00:03,125 f:commit_log_check_thread.py l:41   c:CommitlogConfigParams p:DEBUG > use_hard_size_limit: True
< t:2024-01-27 05:00:03,125 f:commit_log_check_thread.py l:42   c:CommitlogConfigParams p:DEBUG > segment_size_in_mb: 32

from scylladb.

elcallio commented on June 7, 2024

How much memory on the node? 128GB?
128GB max size matches this, and at 14 shards, 9GB limit per shard.
The 8 expected / 56 actual above, is this a shard-local measurement?
And this is all based on metrics, right? Did you get an actual measurement of the commitlog dir size as well?

If I read the metrics screenshot above correctly, the spike is at 05.22, which is 04.22 in the logs. At that time, the node was restarting from a crash, and replayed more or less the full set of logs from previous run. This is all done on shard 0. After this, the replayed segments are all added to shard zero commitlog for either re-use or delete. For a few moments, they will regardless count into the shards disk footprint, until we can determine if they are to be deleted or not. Hence the spike.

Unless I am mistaken here, there is no bug, just the fact that during replay-boot, we don't adhere to the limits fully, esp. not on shard 0.

from scylladb.

temichus commented on June 7, 2024

RAM (MiB) 131072

is this a shard-local measurement - yes

as I mentioned, I can see any observed impact, all nemesis/stress commands passed, no other errors in the test

and this check was implemented by @eliransin request to catch possible errors.

From my understanding as well , a Hard limit means that it can not exceed the limit

from scylladb.

elcallio commented on June 7, 2024

It can, during/after replay. For a short bit.

from scylladb.

temichus commented on June 7, 2024

I this case I can set up gate for this. But how short? 30sec, 1min, 5min. in this issue, I see 19sec

from scylladb.

elcallio commented on June 7, 2024

Should be very short, but what you actually see here will also depend very much on the frequency of metrics polling, and any stalls there. Actual deletion of excess files + reduction of the counters will happen with relatively low prio vs. the other startup stuff happening here, i.e. compaction etc will starve it. I would go for a minute or so (excessive) after reply is finished.

from scylladb.

temichus commented on June 7, 2024

will set 1min them, there is fix proposal scylladb/scylla-cluster-tests#7176

from scylladb.

kostja commented on June 7, 2024

Should be very short, but what you actually see here will also depend very much on the frequency of metrics polling, and any stalls there. Actual deletion of excess files + reduction of the counters will happen with relatively low prio vs. the other startup stuff happening here, i.e. compaction etc will starve it. I would go for a minute or so (excessive) after reply is finished.

Calle, asking the test for "going for a minute or so" is bad UX, we'll get a flaky test. Can we program a barrier so that even during startup there is a cap (be either time limit, or size limit) on the growth of the commit log.

Another concern is that some clients may perceive this growth as a hard limit (the name says so - a hard limit), so may run out of disk space in these circumstances.

WDYT?

from scylladb.

elcallio commented on June 7, 2024

No one is "asking" tests to wait. But this is an artificial problem caused by over trusting/misusing metrics to measure an (in this instance) unrelated value.
The disk foot print is not "growing" here. The files in question exist on disk, regardless. They are picked up, played back and discarded. Note that the actual size/amount of segments replayed might not be even remotely related to the current sharding/foot print limits. Should these files then be ignored when considering how much footprint we are currently consuming? To what end? Make counters look "better" for this particular test case (this has zero impact on anything real).
The main confusion comes from the full set of segments from previous run being added to shard zero.
The test could probably get by also by just looking at full footprint across all shards, which should be within limits (maybe + one segment extra per shard) until deletion/recycling is done.
This also applies to a customer: We will not start using up any significant real additional disk space before replay + deletion of old stuff is done.
And again: the counters are correct (from a certain point of view). It is the tests assumptions that are wrong. Non-clean restart is a very special case.

from scylladb.

kostja commented on June 7, 2024

The test is fixed. Too bad the hard limit is not hard, @elcallio

from scylladb.

commit log directory exceed the limit during disrupt_decommission_streaming_err nemesis about scylladb HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs