Comments (9)
This has been bugging us for a while. I can never reproduce it if I just run the single crashing test:
go test ./... -test.run=TestGetFingerprintsForLabelSet
But if I run all tests, it crashes in this one. So I suspect that it has something to do with storage not being correctly closed or some kind of side-effects between tests. Need to dig deeper.
from prometheus.
I think I've figured out what's causing this!
tieredStorage.Close() calls Drain
and then closes the underlying data store. However, if you check out tieredStorage.Serve() you'll see that calling Drain
doesn't break the event loop. This leads to a case where prometheus tries to write data after the storage is closed -- where we call drain, close the database, and then one of the flush/write memory tickers run.
Note: this also means that calling Close()
on tieredStorage will never shut down its Serve()
goroutine.
from prometheus.
Let me just say this: All of the extra usage coverage you are providing is
thoroughly excellent. I'm ecstatic about the problems we've been
uncovering and fixing as well as making the interface improvements.
A side note: The tiered storage code is immature and was the result of an
exhaustive sprint. Don't be surprised about finding warts in it. The old
storage code was simple, easy to reason, but could never be performant.
I'm hoping with some additional refactorings that the tiered components
will one day take on some of these more intuitive qualities. A major
improvement would be to break out metric indexing into a separate service
that happens out-of-band of time series sample writing.
2013/4/15 Bernerd Schaefer [email protected]
I think I've figured out what's causing this!
tieredStorage.Close()https://github.com/prometheus/prometheus/blob/master/storage/metric/tiered.go#L220calls
Drain and then closes the underlying data store. However, if you check
out tieredStorage.Serve()https://github.com/prometheus/prometheus/blob/master/storage/metric/tiered.go#L182you'll see that calling
Drain doesn't break the event loop. This leads to a case where prometheus
tries to write data after the storage is closed -- where we call drain,
close the database, and then one of the flush/write memory tickers run.Note: this also means that calling Close() on tieredStorage will never
shut down its Serve() goroutine.—
Reply to this email directly or view it on GitHubhttps://github.com//issues/135#issuecomment-16372989
.
from prometheus.
Hm, from what I can see, Drain() blocks until the storage is fully drained (
https://github.com/prometheus/prometheus/blob/master/storage/metric/tiered.go#L111),
breaks the Serve() event loop (
https://github.com/prometheus/prometheus/blob/master/storage/metric/tiered.go#L182),
and only then closes the underlying data store. Am I missing something?
On Mon, Apr 15, 2013 at 10:37 AM, Bernerd Schaefer <[email protected]
wrote:
I think I've figured out what's causing this!
tieredStorage.Close()https://github.com/prometheus/prometheus/blob/master/storage/metric/tiered.go#L220calls
Drain and then closes the underlying data store. However, if you check
out tieredStorage.Serve()https://github.com/prometheus/prometheus/blob/master/storage/metric/tiered.go#L182you'll see that calling
Drain doesn't break the event loop. This leads to a case where prometheus
tries to write data after the storage is closed -- where we call drain,
close the database, and then one of the flush/write memory tickers run.Note: this also means that calling Close() on tieredStorage will never
shut down its Serve() goroutine.—
Reply to this email directly or view it on GitHubhttps://github.com//issues/135#issuecomment-16372989
.
from prometheus.
@juliusv break
in that context breaks the select (which, in this case, does nothing), not the for loop.
from prometheus.
Ohh... you're right, my eyes didn't even see the for loop, but of course it
has to be there :)
So I'll just change it to return.
On Mon, Apr 15, 2013 at 12:31 PM, Bernerd Schaefer <[email protected]
wrote:
@juliusv https://github.com/juliusv break in that context breaks the
select (which, in this case, does nothing), not the for loop.—
Reply to this email directly or view it on GitHubhttps://github.com//issues/135#issuecomment-16377661
.
from prometheus.
Damn, "break" -> "return" just makes the tests hang forever. Trying to find
out why. I'm suspecting that something is still trying to write to the
storage after telling it to drain and then gets blocked on the channels.
On Mon, Apr 15, 2013 at 12:43 PM, Julius Volz [email protected] wrote:
Ohh... you're right, my eyes didn't even see the for loop, but of course
it has to be there :)So I'll just change it to return.
On Mon, Apr 15, 2013 at 12:31 PM, Bernerd Schaefer <
[email protected]> wrote:@juliusv https://github.com/juliusv break in that context breaks the
select (which, in this case, does nothing), not the for loop.—
Reply to this email directly or view it on GitHubhttps://github.com//issues/135#issuecomment-16377661
.
from prometheus.
Found the problem, where we called both Close() (indirectly) and then
afterwards Drain() in a test. Fix in
#143
On Mon, Apr 15, 2013 at 1:18 PM, Julius Volz [email protected] wrote:
Damn, "break" -> "return" just makes the tests hang forever. Trying to
find out why. I'm suspecting that something is still trying to write to the
storage after telling it to drain and then gets blocked on the channels.On Mon, Apr 15, 2013 at 12:43 PM, Julius Volz [email protected]:
Ohh... you're right, my eyes didn't even see the for loop, but of course
it has to be there :)So I'll just change it to return.
On Mon, Apr 15, 2013 at 12:31 PM, Bernerd Schaefer <
[email protected]> wrote:@juliusv https://github.com/juliusv break in that context breaks the
select (which, in this case, does nothing), not the for loop.—
Reply to this email directly or view it on GitHubhttps://github.com//issues/135#issuecomment-16377661
.
from prometheus.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
from prometheus.
Related Issues (20)
- Move required OpenTelemetry packages to prometheus/prometheus HOT 6
- How can I get a week's worth of alert history for Prometheus, Alertmanager via python script? HOT 1
- Configurabe scrape interval / scrape timeout via relabeling still marked as experimental HOT 1
- url parameter configuration via annotations not working for kubernetes service discovery based scrape job
- Allow Content-Encoding zstandard / zstd for scraping metrics HOT 3
- New Prometheus version failed to list pods on old kubernetes cluster HOT 5
- Service down by out of memory, and restart fail with io.wait and head_wal read wal(size= 0)
- remote write 2.0 - formal proposal
- Support Unregistering discovery manager metrics HOT 4
- Add default authorization in config.file prometheus.yml HOT 1
- Incorrect sum aggregations with recording rule HOT 2
- Flaky test TestDeletedSamplesAndSeriesStillInWALAfterCheckpoint
- prometheus_target_reload_length_seconds metric is given inconsistent labels HOT 1
- HIGH-Findings in Prometheus LTS-Version 2.45.3 HOT 1
- investigate the necessity of copying a mmap-ed chunk in ChunkDiskMapper
- Add OpenTelemetry to comparison doc HOT 3
- Feature Flag "Created Timestamps Zero Injection" requires enabled "Native Histograms" HOT 8
- Copiable pressed label set buttons in Prometheus Targets UI HOT 1
- Containerized scraping HOT 2
- quay.io lists v2.51.2 but i see no 2.51.2 release (with tarballs) on github HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from prometheus.