GithubHelp home page GithubHelp logo

S3 Not Reusing Connections about tempo HOT 3 CLOSED

bmteller avatar bmteller commented on July 20, 2024
S3 Not Reusing Connections

from tempo.

Comments (3)

joe-elliott avatar joe-elliott commented on July 20, 2024

It is the caller's responsibility to close Body. The default HTTP client's Transport may not reuse HTTP/1.x "keep-alive" TCP connections if the Body is not read to completion and closed.

The minio client returns the http body directly as the io.Reader from GetObject. Any idea why the defer reader.Close() doesn't cover this? and we actually have to read an io.EOF?

One concern about the proposed fix. It looks like io.Copy will alloc a 32KB buffer for no reason:
https://cs.opensource.google/go/go/+/refs/tags/go1.22.4:src/io/io.go;l=426

Can we make this fix w/o the extra alloc?

as a bonus issue the code checks for EOF and silently discards the EOF error returning a nil error. the return value from this readRange function is just error so its impossible for a caller to know whether all of the bytes requested were actually returned.

This shouldn't happen but agree it's a bug and love the fix here as well.

from tempo.

bmteller avatar bmteller commented on July 20, 2024

you have to actually perform a read that returns io.EOF in order for the connection to be eligible to be put back into the pool. it is a gotcha with golang http client. the quote saying Body is not read to completion come from: https://pkg.go.dev/net/http#Client

io.Copy in order to read the EOF is probably not a good idea for other reasons as well. since it allows a malicious s3 server to denial of service the tempo server by just infinitely streaming a response. you should just be able to just do:

var dummy [1]byte
_, _ = reader.Read(dummy[:])

in order to read the EOF since it should be available on the next read. I guess you could also check the EOF and return an error if it is not an EOF since that might indicate a buggy response from the server if you want to be very defensive.

I think also the GCP backend looks like it has a similar pooling problem but I'm not familiar with internals of the GCP storage client so it might not be an issue. Also, there is the customTransport.MaxIdleConnsPerHost = 100 option which tempo does not set for the s3 backend but does set for the azure backend. I think the default in Minio is 16. There is a comment in the azure backend code that says the default for azure is 2 so maybe it's not necessary to make the change for s3 as well.

from tempo.

joe-elliott avatar joe-elliott commented on July 20, 2024

you should just be able to just do ...

+1 to this approach. The locally declared array should be very minimal impact in exchange for likely some nice gains by reusing tcp connections.

I think also the GCP backend looks like it has a similar pooling problem but I'm not familiar with internals of the GCP storage client so it might not be an issue.

We should go ahead and perform similar changes to all backend. If nothing else we will patch up the bug where we io.EOF before the full range is read.

Also, there is the customTransport.MaxIdleConnsPerHost = 100 option which tempo does not set for the s3 backend but does set for the azure backend. I think the default in Minio is 16.

I'm more indifferent on this. I think setting MaxIdleConns... the same in all backends would likely be wise. We can make it part of this fix or separate. wdyt?

This is a fantastic find. Would you like to submit a PR to fix?

from tempo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.