GithubHelp home page GithubHelp logo

Comments (9)

juliusv avatar juliusv commented on April 30, 2024

This has been bugging us for a while. I can never reproduce it if I just run the single crashing test:

go test ./... -test.run=TestGetFingerprintsForLabelSet

But if I run all tests, it crashes in this one. So I suspect that it has something to do with storage not being correctly closed or some kind of side-effects between tests. Need to dig deeper.

from prometheus.

bernerdschaefer avatar bernerdschaefer commented on April 30, 2024

I think I've figured out what's causing this!

tieredStorage.Close() calls Drain and then closes the underlying data store. However, if you check out tieredStorage.Serve() you'll see that calling Drain doesn't break the event loop. This leads to a case where prometheus tries to write data after the storage is closed -- where we call drain, close the database, and then one of the flush/write memory tickers run.

Note: this also means that calling Close() on tieredStorage will never shut down its Serve() goroutine.

from prometheus.

matttproud avatar matttproud commented on April 30, 2024

Let me just say this: All of the extra usage coverage you are providing is
thoroughly excellent. I'm ecstatic about the problems we've been
uncovering and fixing as well as making the interface improvements.

A side note: The tiered storage code is immature and was the result of an
exhaustive sprint. Don't be surprised about finding warts in it. The old
storage code was simple, easy to reason, but could never be performant.
I'm hoping with some additional refactorings that the tiered components
will one day take on some of these more intuitive qualities. A major
improvement would be to break out metric indexing into a separate service
that happens out-of-band of time series sample writing.

2013/4/15 Bernerd Schaefer [email protected]

I think I've figured out what's causing this!

tieredStorage.Close()https://github.com/prometheus/prometheus/blob/master/storage/metric/tiered.go#L220calls
Drain and then closes the underlying data store. However, if you check
out tieredStorage.Serve()https://github.com/prometheus/prometheus/blob/master/storage/metric/tiered.go#L182you'll see that calling
Drain doesn't break the event loop. This leads to a case where prometheus
tries to write data after the storage is closed -- where we call drain,
close the database, and then one of the flush/write memory tickers run.

Note: this also means that calling Close() on tieredStorage will never
shut down its Serve() goroutine.


Reply to this email directly or view it on GitHubhttps://github.com//issues/135#issuecomment-16372989
.

from prometheus.

juliusv avatar juliusv commented on April 30, 2024

Hm, from what I can see, Drain() blocks until the storage is fully drained (
https://github.com/prometheus/prometheus/blob/master/storage/metric/tiered.go#L111),
breaks the Serve() event loop (
https://github.com/prometheus/prometheus/blob/master/storage/metric/tiered.go#L182),
and only then closes the underlying data store. Am I missing something?

On Mon, Apr 15, 2013 at 10:37 AM, Bernerd Schaefer <[email protected]

wrote:

I think I've figured out what's causing this!

tieredStorage.Close()https://github.com/prometheus/prometheus/blob/master/storage/metric/tiered.go#L220calls
Drain and then closes the underlying data store. However, if you check
out tieredStorage.Serve()https://github.com/prometheus/prometheus/blob/master/storage/metric/tiered.go#L182you'll see that calling
Drain doesn't break the event loop. This leads to a case where prometheus
tries to write data after the storage is closed -- where we call drain,
close the database, and then one of the flush/write memory tickers run.

Note: this also means that calling Close() on tieredStorage will never
shut down its Serve() goroutine.


Reply to this email directly or view it on GitHubhttps://github.com//issues/135#issuecomment-16372989
.

from prometheus.

bernerdschaefer avatar bernerdschaefer commented on April 30, 2024

@juliusv break in that context breaks the select (which, in this case, does nothing), not the for loop.

from prometheus.

juliusv avatar juliusv commented on April 30, 2024

Ohh... you're right, my eyes didn't even see the for loop, but of course it
has to be there :)

So I'll just change it to return.

On Mon, Apr 15, 2013 at 12:31 PM, Bernerd Schaefer <[email protected]

wrote:

@juliusv https://github.com/juliusv break in that context breaks the
select (which, in this case, does nothing), not the for loop.


Reply to this email directly or view it on GitHubhttps://github.com//issues/135#issuecomment-16377661
.

from prometheus.

juliusv avatar juliusv commented on April 30, 2024

Damn, "break" -> "return" just makes the tests hang forever. Trying to find
out why. I'm suspecting that something is still trying to write to the
storage after telling it to drain and then gets blocked on the channels.

On Mon, Apr 15, 2013 at 12:43 PM, Julius Volz [email protected] wrote:

Ohh... you're right, my eyes didn't even see the for loop, but of course
it has to be there :)

So I'll just change it to return.

On Mon, Apr 15, 2013 at 12:31 PM, Bernerd Schaefer <
[email protected]> wrote:

@juliusv https://github.com/juliusv break in that context breaks the
select (which, in this case, does nothing), not the for loop.


Reply to this email directly or view it on GitHubhttps://github.com//issues/135#issuecomment-16377661
.

from prometheus.

juliusv avatar juliusv commented on April 30, 2024

Found the problem, where we called both Close() (indirectly) and then
afterwards Drain() in a test. Fix in
#143

On Mon, Apr 15, 2013 at 1:18 PM, Julius Volz [email protected] wrote:

Damn, "break" -> "return" just makes the tests hang forever. Trying to
find out why. I'm suspecting that something is still trying to write to the
storage after telling it to drain and then gets blocked on the channels.

On Mon, Apr 15, 2013 at 12:43 PM, Julius Volz [email protected]:

Ohh... you're right, my eyes didn't even see the for loop, but of course
it has to be there :)

So I'll just change it to return.

On Mon, Apr 15, 2013 at 12:31 PM, Bernerd Schaefer <
[email protected]> wrote:

@juliusv https://github.com/juliusv break in that context breaks the
select (which, in this case, does nothing), not the for loop.


Reply to this email directly or view it on GitHubhttps://github.com//issues/135#issuecomment-16377661
.

from prometheus.

lock avatar lock commented on April 30, 2024

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

from prometheus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.