GithubHelp home page GithubHelp logo

Comments (11)

kvoli avatar kvoli commented on July 17, 2024 1

just wanted to confirm where do we measure the snapshot stream throughput? is it frin the crdb ui console?

Thats correct, the replication dashboard page has the snapshot received bytes rated metric. The metric is range.snapshots.rcvd-bytes.

from cockroach.

lyang24 avatar lyang24 commented on July 17, 2024 1

I upgraded to v24.1 with my custom change to add more timers and iterator stats print out #124067
the send up is the bank dataset with no traffic I tuned the replication factor from 3 -> 5, all snapshots are delegated.

      num_replicas = 3 -> 5,
      num_voters = 3 -> 5,
      constraints = '{+region=us-central: 1 -> 2, +region=us-east-1: 1 -> 2, +region=us-east-2: 1}',
      voter_constraints = '{+region=us-central: 1 -> 2, +region=us-east-1: 1 -> 2, +region=us-east-2: 1}'

I got mixed signals.

I observed grpc bottleneck for the first time where 256mb range took 3 seconds over the wire.
LEARNER with 256 MiB in 3.77s @ 68 MiB/s: kvs=26876 rangeKVs=0 sharedSSTs=0, externalSSTs=0, rate-limit: 1.0 GiB/s, queued: 0.30s"
sender tags
tags:<key:"iter" value:"3.8s" > tags:<key:"send" value:"3s" >
receiver tags
<key:"sst" value:"865ms" > tags:<key:"sstFinalize" value:"2ms" > tags:<key:"checksum" value:"31ms" > tags:<key:"recv" value:"2.9s" > > start_time:<seconds:1715757398 nanos:209378031 > duration:<seconds:3 nanos:795977518 >

I observed the sender bottleneck on a different node where lsm tree is slow, it took 2.3s for sender/ receiver to process the data.
LEARNER with 374 MiB in 3.33s @ 112 MiB/s: kvs=39187 rangeKVs=0 sharedSSTs=0, externalSSTs=0, rate-limit: 1.0 GiB/s, queued: 0.00s
sender tags:
tags:<key:"iter" value:"3.3s" > tags:<key:"send" value:"987ms"
receiver tags:
tags:<key:"totalTime" value:"3.3s" > tags:<key:"sst" value:"2.2s" > tags:<key:"sstFinalize" value:"4ms" > tags:<key:"checksum" value:"64ms" > tags:<key:"recv" value:"1.1s" >
I printed out the iterator stats on the sender side below is the largest in terms of bytes:
iterator stats: \342\200\271(interface (dir, seek, step): (fwd, 1, 39085), (rev, 0, 0)), (internal (dir, seek, step): (fwd, 1, 39085), (rev, 0, 0)),\342\200\272\n\342\200\271(internal-stats: (block-bytes: (total 374MB, cached 32KB, read-time 1.751983268s)), (points: (count 39K, key-bytes 764KB, value-bytes 373MB, tombstoned 0)))
and from the new tags it points out most of the time is spend on iterating point keys

		for ok := true; ok && err == nil; ok, err = iter.NextEngineKey() {
			kvs++
			if b == nil {
				b = kvSS.newWriteBatch()
			}
			key, err := iter.UnsafeEngineKey()
			if err != nil {
				return err
			}
			v, err := iter.UnsafeValue()
			if err != nil {
				return err
			}
			if err = b.PutEngineKey(key, v); err != nil {
				return err
			}
			if err = maybeFlushBatch(); err != nil {
				return err
			}
		}

just for curiosity I ran network speed test on the host that has grpc bottleneck vs snapshot in crdb docker
256mb / 3 = 85mbps:

root@node31:~# speedtest-cli --secure
Retrieving speedtest.net configuration...
Testing from Roblox...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Windstream (Chicago, IL) [31.77 km]: 2.298 ms
Testing download speed................................................................................
Download: 4250.59 Mbit/s
Testing upload speed......................................................................................................
Upload: 3305.07 Mbit/s

I also ran the disk performance test on the host that has lsm bottleneck vs read in crdb docker
374mb / 1.75s = 213mbps

Last login: Tue Apr 23 00:15:15 2024 from 192.168.239.212
root@node65:~# sudo hdparm -Tt /dev/sda

/dev/sda:
 Timing cached reads:   17562 MB in  1.98 seconds = 8861.72 MB/sec
 Timing buffered disk reads: 1446 MB in  3.00 seconds = 481.81 MB/sec

from cockroach.

blathers-crl avatar blathers-crl commented on July 17, 2024

cc @cockroachdb/replication

from cockroach.

lyang24 avatar lyang24 commented on July 17, 2024

I am excited to look into this one
We have found that once this rate limit is increased, snapshots will still only stream at a maximum of around 120MB/s
just wanted to confirm where do we measure the snapshot stream throughput? is it in the crdb ui console?

from cockroach.

lyang24 avatar lyang24 commented on July 17, 2024

I did some testing on roblox environment I cannot reproduce this bottle with v23.2.2. May i get more details on the testing set up that produced 120mb limit?

my set up is 21 node multi region cluster with 10 million row of bank workload, I configured kv.snapshot_rebalance.max_rate to 1024mib

cockroach workload init bank --rows 10000000 --ranges 3500 --payload-bytes 10000 --data-loader IMPORT ""

cockroach workload run bank --rows 10000000 --batch-size 100 --payload-bytes 10000 --concurrency 64 --tolerate-errors ""

then i paused 3 nodes on Chicago region, it seems like the snapshot recv limit is closely saturated

image
image

from cockroach.

lyang24 avatar lyang24 commented on July 17, 2024

snapshots will still only stream at a maximum of around 120MB/s is referring to single node throughput what i am measuring is cluster wide. please ignore my previous comment.

from cockroach.

lyang24 avatar lyang24 commented on July 17, 2024

I have confirmation on the snapshot apply throughput limit and some questions on it.

Details:
when i looked at snapshot sent and receive metrics per node I noticed they are about the same (although those two do not have 1 to 1 mapping). I wondered if the bottleneck are the send process.
This is the code where snapshots are sent and below are some matching logs in our system to prove 120MB/s limit

I240429 20:46:34.052031 137927657 13@kv/kvserver/store_snapshot.go:1869 ⋮ [T1,Vsystem,n60,s60,r1279894/1:‹/Table/318/1/95{292542-339525}›] 5636  streamed snapshot fb91a7a3 at applied index 55218 to (n44,s44):7LEARNER with 450 MiB in 3.83s @ 117 MiB/s: kvs=46998 rangeKVs=0 sharedSSTs=0, rate-limit: 1.0 GiB/s, queued: 0.00s
I240429 20:46:43.427214 137931148 13@kv/kvserver/store_snapshot.go:1869 ⋮ [T1,Vsystem,n60,s60,r1278868/1:‹/Table/318/1/7{29645…-30000…}›] 5637  streamed snapshot e9274d39 at applied index 55724 to (n18,s18):11LEARNER with 340 MiB in 2.85s @ 119 MiB/s: kvs=35506 rangeKVs=0 sharedSSTs=0, rate-limit: 1.0 GiB/s, queued: 0.00s

I dive into the code a bit more I found there are lots of timers that time different phases of the snapshot sending process
https://github.com/cockroachdb/cockroach/blob/4e178debf6f4fb438ab2464b33f6ee932d2fb356/pkg/kv/kvserver/store_snapshot.go#L740C1-L752C37

is there anyways that i can visualize the timer results that might give good indication for bottleneck?
cc @kvoli

from cockroach.

kvoli avatar kvoli commented on July 17, 2024

is there anyways that i can visualize the timer results that might give good indication for bottleneck?

You may be able to visualize the timer results by manually pushing a range that is under-replicated via the replicate queue. You can do this via the advanced debug page or directly via the endpoint at #/debug/enqueue_range?rangeID=XXX. This will produce a trace table which should show those timing events.

from cockroach.

lyang24 avatar lyang24 commented on July 17, 2024

is there anyways that i can visualize the timer results that might give good indication for bottleneck?

You may be able to visualize the timer results by manually pushing a range that is under-replicated via the replicate queue. You can do this via the advanced debug page or directly via the endpoint at #/debug/enqueue_range?rangeID=XXX. This will produce a trace table which should show those timing events.

The /debug/enqueue_range endpoint is wonderful I was able to capture a couple of traces

tag_groups:<name:"snapshot_timing_tag" tags:<key:"rateLimit" value:"3ms" > tags:<key:"totalTime" value:"4.1s" > tags:<key:"iter" value:"4.1s" > tags:<key:"send" value:"577ms" > > 

tag_groups:<name:"snapshot_timing_tag" tags:<key:"rateLimit" value:"973\302\265s" > tags:<key:"totalTime" value:"2.6s" > tags:<key:"iter" value:"2.6s" > tags:<key:"send" value:"211ms" > > start_time:<seconds:1714591174 nanos:262669556 > duration:<seconds:4 nanos:859442097 > 

From the trace it appears that most of the time is spent on IterateReplicaKeySpans, is it a good direction to dive deep into that iterator to see if there are performance bottlenecks?

Also there are a couple of component to the snapshot transfer time: waiting in queue, iterating key spans, rate-limiting, and send bytes over the wire. I felt like that might not be an apples to apples comparison with hardware limits (i/o, network).

nit: the second trace looks a bit off with 973\302\265s i am trying to see how to properly format it.

cc @nvanbenschoten @kvoli

from cockroach.

kvoli avatar kvoli commented on July 17, 2024

From the trace it appears that most of the time is spent on IterateReplicaKeySpans, is it a good direction to dive deep into that iterator to see if there are performance bottlenecks?

That is a good direction to look into. Feel free to add more timing tags in other places as well, if they prove useful.

nit: the second trace looks a bit off with 973\302\265s i am trying to see how to properly format it.

That does seem off, that would be the snapshot rate limit configured setting?

from cockroach.

lyang24 avatar lyang24 commented on July 17, 2024

for grpc throughput I plan to experiment with:
MaxSendMsgSize are already set at maximum in cockcorachdb
MaxRecvMsgSize are already set at maximum in cockcorachdb
InitialWindowSize:

  1. set COCKROACH_RPC_INITIAL_WINDOW_SIZE to 64mb or even remove the maximum in the code and set to 1 GB
  2. set os level defaults to in two of the machines to observe impacts on throughput sysctl -w net.core.rmem_default sysctl -w net.core.wmem_default

for pebble read performance:
I recall the doc on Roblox / crl performance testing 2023 had an action item of Tune pebble block cache size is this something worth look into?

from cockroach.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.