Comments (8)
Is it the same if you specify the timestamp? How could we reproduce this?
from thanos.
FIWI: since subqueries ( even with same step ) are a very different construct, it is not unimaginable that they return different results. As for the dedup ~ i dont know whats happening here.
from thanos.
@GiedriusS I'm using a specific timestamp.
for me it reproduces any time on my environment. it's a counter metric that comes from 2 prometheus replicas.
@MichaHoffmann I tried to debug it a bit, I suspect it's something where the dedup choose which timestamps to use from the samples, but I got a little lost there.
if you can refer me to where to look, I can continue debugging. (or get some relevant info to post here)
from thanos.
found something interesting, if I do
my_metric{_label_0="service",_label_1="requests",_label_2="timer", server="server00102"}[5m]
without deduplication in a specific timestamp I get the following:
server1:
2583033 @1714665427.092
2588869 @1714665487.092
2594666 @1714665547.092
2600461 @1714665607.092
2606298 @1714665667.092
server2:
2583033 @1714665450.205
2588869 @1714665510.205
2594666 @1714665570.205
2600461 @1714665630.205
2606298 @1714665690.205
but when I do the same with deduplication, I get one more sample, the interval between the first to the second is much small (23 seconds) than the rest (60 seconds), and the diff in values is the 0, since the first sample and second are the same.
2583033 @1714665427.092
2583033 @1714665450.205 (23 seconds interval from previous, same value as previous)
2588869 @1714665510.205
2594666 @1714665570.205
2600461 @1714665630.205
2606298 @1714665690.205
it seems it takes the first from servers 1 and then the rest from server2.
now it clear why adding a step solves it (removes the duplicated sample).
any idea?
from thanos.
I managed to reproduce this use case in a unit test.
I added the following test case to TestDedupSeriesSet in dedup/iter_test.go:
{
name: "My Regression test",
isCounter: true,
input: []series{
{
lset: labels.Labels{{Name: "a", Value: "1"}},
samples: []sample{
{1714665427092, 2583033},
{1714665487092, 2588869},
{1714665547092, 2594666},
{1714665607092, 2600461},
{1714665667092, 2606298},
},
}, {
lset: labels.Labels{{Name: "a", Value: "1"}},
samples: []sample{
{1714665450205, 2583033},
{1714665510205, 2588869},
{1714665570205, 2594666},
{1714665630205, 2600461},
{1714665690205, 2606298},
},
},
},
exp: []series{
{
lset: labels.Labels{{Name: "a", Value: "1"}},
samples: []sample{{1714665427092, 2583033}, {1714665487092, 2588869}, {1714665547092, 2594666}, {1714665607092, 2600461}, {1714665667092, 2606298}},
},
},
},
The first sample doesn't get deduplicated and it takes the first samples from both sets.
I noticed that in dedup.iter.go Next() there is a usage in initial penalty for the first sample. the value of that penalty is 5000. If I increase that value to 23113, my test passes. anything lower and the test fails.
I'm not sure if it's the expected behavior
from thanos.
Thank you for the test and the debug work! Ill look into this over weekend
from thanos.
So, yeah; the scrape interval is large enough that the dedup algorithm thinks that the second sample of the first iterator is actually missing and that we need to fill with the second iterator from now on. This might be a bit hard to solve since we ( right now ) dont know the proper scrape interval apriori. I think we could maybe add a configurable flag like --deduplication.penalty-scrape-interval-hint=30s
kinda flag to prime the deduplication algorithm that an initial gap of 30s does not constitute a missing sample and we can keep with the first iterator.
@GiedriusS @fpetkovski @yeya24 how does that sound?
from thanos.
perhaps I'm mixing, but shouldn't --query.default-step affect that penalty?
also, I might be able with a PR once there will be a design for the fix.
from thanos.
Related Issues (20)
- Read value of remote_user in Slow Query Logs of Query Frontend from a HTTP header HOT 3
- Thanos Receive doesn't announce external_labels which are set in hashrings.json when it works in routing and ingesting mode. HOT 1
- Issue with deduplication alogrithm in Thanos HOT 4
- Query Stats Returned with query including query bytes fetched HOT 5
- Max and min pointed at Sidecars not working on 0.35 HOT 15
- `ThanosSidecarBucketOperationsFailed` alert is flaky
- PR Title Validation
- Thanos Receive Pod is crashing with Readiness and livness Probe Failed
- Thanos ruler vs. eventual consistency of metrics
- Can Huawei's OBS storage be supported?
- Thanos React-app : Proxy server for thanos-query
- Query: update of endpoint failed...context deadline exceeded
- Thanos Chart 0.34.0 app version 12.23.1
- Thanos receive fails "no space left on device"
- sidecar: Greatly increased Thanos sidecar memory usage from 0.32.2 to 0.32.3, still exists in 0.35.0 HOT 3
- api/v1/label returns wrong values HOT 2
- Regression in thanos v0.35.1 HOT 2
- Thanos Receiver: Router/Ingestor setup no longer returns `thanos_receive_write_timeseries_*` and `thanos_receive_write_samples_*` metrics with thanos v0.35.1 HOT 2
- Extend Thanos bucket rewrite to support filtered archiving of existing blocks
- Support additional aggregates for downsampling
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from thanos.