GithubHelp home page GithubHelp logo

Comments (8)

hawkw avatar hawkw commented on July 22, 2024

I'm happy to work on implementing this, but I suspect @carllerche will have some opinions on how we ought to do it.

from tower.

carllerche avatar carllerche commented on July 22, 2024

I think that this is a great idea.

My original thought was to not do backoff in the reconnect middleware, instead keep back off as part of Retry (which doesn't exist yet) and then Retry<Reconnect<...>> would provide the backoff reconnect behavior.

However, after thinking some more about it, I don't think that this is necessarily ideal because Retry will incur some amount of additional overhead necessary to keep a handle to the original request and performing the retry of the request. This overhead probably isn't necessary int he reconnect w/ back off case.

Given this, implementing the logic directly in tower-reconnect is probably the right first (and maybe final) step. Once we tackle Retry we can then think about what, if anything, can be shared between the two.

@hawkw if you want to try to take a stab at this, go for it. The one nit with your original list is:

take a stream of durations to use as backoffs

This probably won't be a stream in the futures sense. Instead, the type (Backoff?) will probably implement IntoIterator such that the Item = Duration.

Maybe @olix0r or @danburkert have additional thoughts on the matter.

This relates a little to #14.

from tower.

hawkw avatar hawkw commented on July 22, 2024

My original thought was to not do backoff in the reconnect middleware, instead keep back off as part of Retry (which doesn't exist yet) and then Retry<Reconnect<...>> would provide the backoff reconnect behavior.

However, after thinking some more about it, I don't think that this is necessarily ideal because Retry will incur some amount of additional overhead necessary to keep a handle to the original request and performing the retry of the request. This overhead probably isn't necessary int he reconnect w/ back off case.

My thought is that we provide a "unit" or "empty" backoff type, and a Retry with the empty/unit backoffs would just do the current Reconnect behaviour of immediately failing the request on connect errors, because the backoffs are always exhausted.

Instead, the type (Backoff?) will probably implement IntoIterator such that the Item = Duration.

Yeah, that seems right.

from tower.

hawkw avatar hawkw commented on July 22, 2024

A question that came up up while working on this is: what timer should be used for the backoffs?

In Conduit, I've been working on an abstraction over timer implementations (linkerd/linkerd2#480) that allows a mock timer to be injected for testing, and we'd expect that if I configure conduit to use a mock timer, backoffs will also wait based on the mock timer rather than the default timer. Furthermore, we'd ideally want users who are using tokio-timer to be able to use that timer for backoffs, without requiring the tokio-timer dependency. This implies to me that we might want to move the timer facade work I've been doing from Conduit to tower. Does that seem reasonable?

from tower.

carllerche avatar carllerche commented on July 22, 2024

I would prefer to not introduce a Timer trait yet. Traits come with ergonomic overhead. The next iteration of tokio-timer can handle the requirements of a being able to "mock" out timers.

from tower.

danburkert avatar danburkert commented on July 22, 2024

I do have some thoughts on retry strategies. I think the strategy @hawkw outlined in the first comment is valid, but it does have the downside that every error becomes a timeout error. This can be mitigated with some careful error management, but it's pretty tricky to do in an ergonomic way and without throwing away the intermediate errors.

The retry strategy also needs to be designed with speculative / hedged connections in mind. E.g. if the service has three replicas which you can choose from, you may not want to spend your full timeout attempting to connect to one of them. Perhaps this is better solved at a higher-level, though.

from tower.

olix0r avatar olix0r commented on July 22, 2024

@danburkert I think that request-level retry strategies should be handled at a higher level. Reconnect should really only be addressing the layer 4 concern of establishing a transport. I'd expect this to be wrapped with a connection pool, the connection pool to be wrapped with a balancer, and then retries to wrap the balancer. I'm in the process of writing up some general plans for linkerd/linkerd2#475 -- would love your input once that's up.

from tower.

danburkert avatar danburkert commented on July 22, 2024

Ah ok, seems I misunderstood the requirements here. If you are reconnecting just the layer 4 transport (say, TCP), how do you handle application-level negotiations that need to take place like TCP, SASL, and custom handshakes?

In general doing anything more complex than layer 4 means that the reconnect logic needs to be able to distinguish between retriable and non-retriable errors.

from tower.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.