GithubHelp home page GithubHelp logo

Comments (9)

mdisibio avatar mdisibio commented on July 20, 2024

Hi, thanks for posting your configuration. Can you also give an estimate of how many span/s (tempo_distributor_spans_received_total) this cluster is receiving? Also, a file listing or information about the blocks in the generator (maybe default /var/tempo/generator/traces)? These will be very helpful to see the volume/size of this cluster and what settings are best.

To start, yes there is local caching in the local-blocks processor, which is why it is faster on the next call. Thanks for testing async, agree generally sync is faster which is why it is the default.

Aggregate By is not heavy on the frontend and queriers, it is mainly metrics generator.

Here are next steps I would recommend:

  1. Change parquet_row_group_size_bytes to 100MB. We have found that ~100MB is ideal in all our clusters, both for searching and metrics. 500MB is large, which increases size of dictionary per row group, and overhead to scan the block.
  2. What is block:max_block_bytes? The default is 500MB, which is ok for the entire block size, on average it will be 5 row groups.
  3. What is block:max_block_duration? I would set to between 1 and 5 minutes. This is a good balance between how long data is stored in the wal (less efficient to scan), and flushing overhead. If 5 minutes then each call for last hour will scan 12 blocks.
  4. Finally, generators respond well to horizontal scaling. On the same input volume, 2x the number of pods means each pod contains 1/2 the data. Without digging deeper to identify the specific bottleneck, it's hard to say how fast each pod should be in your workload, but scaling will help.

from tempo.

icemanDD avatar icemanDD commented on July 20, 2024

how many span/s (tempo_distributor_spans_received_total) this cluster is receiving?

around ~500k spans/s

a file listing or information about the blocks in the generator (maybe default /var/tempo/generator/traces)?

For example one file in wal:

  File: 00000191
  Size: 131235840       Blocks: 256328     IO Block: 4096 
  Inode: 1180675     Links: 1

from tempo.

icemanDD avatar icemanDD commented on July 20, 2024

after updating parquet_row_group_size_bytes to 100MB

  Size: 30059883        Blocks: 58712      IO Block: 4096   
  Inode: 1308723     Links: 1

from tempo.

mdisibio avatar mdisibio commented on July 20, 2024
 File: 00000191
  Size: 131235840

WAL "blocks" are composed of internal flushes which are mini-parquet files. This looks like flush 191, and is 130MB. That number of flushes for a WAL block is kind of high. Are flush_check_period and max_block_duration default values? With default values flush_check_period=10s, max_block_duration=1m, there are 6 flushes per wal block, and 60 blocks total for last hour. The final blocks are in var/tempo/generator/traces/blocks/<tenant>.

For 500K spans/s you may need 50+ generators to get the "cold" latency to your target.

from tempo.

icemanDD avatar icemanDD commented on July 20, 2024

one of the block file:

File: 5d8f3be4-c96b-45f7-bd7e-618d5cd60172/
  Size: 4096            Blocks: 8          IO Block: 4096   directory
  Inode: 3932202     Links: 2

does this make sense? I am using default MaxBlockBytes(500MB) and max_block_duration to 3m
also will update search_page_size_bytes, bloom_filter_shard_size_bytes or parquet_dedicated_columns help?

from tempo.

mdisibio avatar mdisibio commented on July 20, 2024

Size: 4096

This is listing the directory, can you list the files inside the folder (i.e. data.parquet) ?

parquet_dedicated_columns

Yes, definitely. This blog post has a walk through and to use tempo-cli analyse blocks.

from tempo.

icemanDD avatar icemanDD commented on July 20, 2024

Got it, example on data.parquet:

  File: data.parquet
  Size: 184291173       Blocks: 359952     IO Block: 4096   regular file
  Inode: 1308506     Links: 1

from tempo.

mdisibio avatar mdisibio commented on July 20, 2024

@icemanDD Hi, did the new settings and dedicated columns help?

from tempo.

icemanDD avatar icemanDD commented on July 20, 2024

Hi @mdisibio that helps a bit, but we realize traceql metrics works better, does it follow the same pattern for optimization? Or it needs scaling for querier for better performance, especially query older data?

from tempo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.