Comments (9)
Hi, thanks for posting your configuration. Can you also give an estimate of how many span/s (tempo_distributor_spans_received_total) this cluster is receiving? Also, a file listing or information about the blocks in the generator (maybe default /var/tempo/generator/traces
)? These will be very helpful to see the volume/size of this cluster and what settings are best.
To start, yes there is local caching in the local-blocks processor, which is why it is faster on the next call. Thanks for testing async, agree generally sync is faster which is why it is the default.
Aggregate By is not heavy on the frontend and queriers, it is mainly metrics generator.
Here are next steps I would recommend:
- Change parquet_row_group_size_bytes to 100MB. We have found that ~100MB is ideal in all our clusters, both for searching and metrics. 500MB is large, which increases size of dictionary per row group, and overhead to scan the block.
- What is
block:max_block_bytes
? The default is 500MB, which is ok for the entire block size, on average it will be 5 row groups. - What is
block:max_block_duration
? I would set to between 1 and 5 minutes. This is a good balance between how long data is stored in the wal (less efficient to scan), and flushing overhead. If 5 minutes then each call for last hour will scan 12 blocks. - Finally, generators respond well to horizontal scaling. On the same input volume, 2x the number of pods means each pod contains 1/2 the data. Without digging deeper to identify the specific bottleneck, it's hard to say how fast each pod should be in your workload, but scaling will help.
from tempo.
how many span/s (tempo_distributor_spans_received_total) this cluster is receiving?
around ~500k spans/s
a file listing or information about the blocks in the generator (maybe default /var/tempo/generator/traces)?
For example one file in wal:
File: 00000191
Size: 131235840 Blocks: 256328 IO Block: 4096
Inode: 1180675 Links: 1
from tempo.
after updating parquet_row_group_size_bytes to 100MB
Size: 30059883 Blocks: 58712 IO Block: 4096
Inode: 1308723 Links: 1
from tempo.
File: 00000191
Size: 131235840
WAL "blocks" are composed of internal flushes which are mini-parquet files. This looks like flush 191, and is 130MB. That number of flushes for a WAL block is kind of high. Are flush_check_period and max_block_duration default values? With default values flush_check_period=10s, max_block_duration=1m, there are 6 flushes per wal block, and 60 blocks total for last hour. The final blocks are in var/tempo/generator/traces/blocks/<tenant>
.
For 500K spans/s you may need 50+ generators to get the "cold" latency to your target.
from tempo.
one of the block file:
File: 5d8f3be4-c96b-45f7-bd7e-618d5cd60172/
Size: 4096 Blocks: 8 IO Block: 4096 directory
Inode: 3932202 Links: 2
does this make sense? I am using default MaxBlockBytes(500MB) and max_block_duration to 3m
also will update search_page_size_bytes, bloom_filter_shard_size_bytes or parquet_dedicated_columns help?
from tempo.
Size: 4096
This is listing the directory, can you list the files inside the folder (i.e. data.parquet) ?
parquet_dedicated_columns
Yes, definitely. This blog post has a walk through and to use tempo-cli analyse blocks
.
from tempo.
Got it, example on data.parquet:
File: data.parquet
Size: 184291173 Blocks: 359952 IO Block: 4096 regular file
Inode: 1308506 Links: 1
from tempo.
@icemanDD Hi, did the new settings and dedicated columns help?
from tempo.
Hi @mdisibio that helps a bit, but we realize traceql metrics works better, does it follow the same pattern for optimization? Or it needs scaling for querier for better performance, especially query older data?
from tempo.
Related Issues (20)
- Trace by ID: JSON response does not align with OTLP JSON format HOT 1
- TraceQL Metrics: { false } | rate() returns data HOT 1
- Endpoint does not follow ip address or domain name standards for grafana tempo HOT 11
- Add semver version to api/status/buildinfo endpoint HOT 5
- Tempo mixin: alert and recording rules make range interval configurable HOT 1
- Module path needs to be updated with '/v2' HOT 1
- [DOC] Update overrides content to clarify configuration for new, legacy
- unable to add data source to tempo with URL HOT 3
- Grafana tempo not getting traces after helm deploy HOT 1
- Use a non-retryable HTTP status code for unrecoverable rate limit errors
- Are there plans to open outgoing ports to support custom storage backends
- Use connect-go to serve streaming endpoints
- [DOC] Upload images from repo to storage area for docs
- `metrics-generator` idle and no traces are generated using Tempo 2.5.0 HOT 9
- Allow a way to query for large traces that exceeded the ingester limit HOT 3
- Logger customization to avoid parse error HOT 1
- Support Zone Aware Replication cli Flags HOT 2
- getting field role_arn not found in type s3.Config (error to add role_arn for garfana tempo-distributed ) HOT 3
- Misc Issues with mixin HOT 1
- TIME_WAIT Issue inside pod for tempo-distributed HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tempo.