truelayer / ginepro Goto Github PK
View Code? Open in Web Editor NEWA client-side gRPC channel implementation for tonic
License: Apache License 2.0
A client-side gRPC channel implementation for tonic
License: Apache License 2.0
Conversion from
prost_types::Timestamp
toSystemTime
can cause an overflow and panic
Details | |
---|---|
Package | prost-types |
Version | 0.7.0 |
URL | tokio-rs/prost#438 |
Date | 2021-07-08 |
Patched versions | >=0.8.0 |
Affected versions of this crate contained a bug in which untrusted input could cause an overflow and panic when converting a Timestamp
to SystemTime
.
It is recommended to upgrade to prost-types
v0.8 and switch the usage of From<Timestamp> for SystemTime
to TryFrom<Timestamp> for SystemTime
.
See #438 for more information.
See advisory page for additional details.
ginepro indirectly uses tower::Balance
which makes use a best-of-two random load balancing strategy.
Unfortunately, tonic has no built in mechanism for determining load, so this is hard coded to be 0 always.
https://docs.rs/tonic/0.8.3/src/tonic/transport/service/connection.rs.html#114-120
This means we have randomise distribution. This is still a balancing technique but is known to not be very good
I ran into various problems with this upgrade. Still trying to understand them and have had to revert for now, I'll provide more detail when I can get it.
But think upgrading the tonic dependency from 0.8 to 0.9 should probably have been done in a 0.6.0 version update rather than a patch release.
The service probe loop appears to log and ignore errors while running. This seems fine while the service is running, but at startup, it could be an indication that the ServiceDefinition
has an invalid hostname
(e.g., has a typo). Some callers might prefer to panic in that situation so deployment systems are immediately aware of a problem instead of just reporting nothing resolved via metrics.
When constructing a LoadBalancedChannel
, the code should wait for an initial resolution of the provided ServiceDefinition
to succeed before returning the LoadBalancedChannel
. This has the benefit of (1) finding invalid DNS names immediately (and allowing the program to exit immediately); and (2) ensures that LoadBalancedChannel
has an initial non-empty set of endpoints to use before the program enters application code.
An alternative would be for callers to access the current set of endpoints and wait with a timeout until there are one or more endpoints. If no endpoints are resolved by the timeout, then a caller could error exit or alert. I don't know offhand whether Channel
supports something like this.
connection_timeout_is_not_fatal
test takes ~75 seconds to finish.Well-configured timeouts are important for system stability. Requests which take too long can hog resources and block other work from happening.
I can see two separate timeout problems:
ResolutionStrategy::Lazy
is used, there is currently no way to apply a timeout just for DNS resolution. If DNS never resolves, requests never complete.tonic
doesn't use it!Even though we're setting our own fairly short timeouts around the overall request, I've seen some strange behaviour where requests are hanging for a long time. I think there's still something else going on that I don't understand, but I expect addressing the two points above will be generally helpful anyway.
For the TCP connection timeout, just run the tests. I'll supply a test for lazy DNS resolution timeouts in a separate PR.
Ability to control timeouts for TCP connections and DNS resolution.
The TCP connection timeout is simpler to solve (though I will admit took me a long time to find): we just need to set connect_timeout
in the right places. First, topic
doesn't respect connect_timeout
, which will be fixed by hyperium/tonic#1215. When that is merged, we can create our own connect_timeout
option on top of it in #38.
DNS resolution is harder. There are currently two options:
LoadBalancedChannel
is created. This might be a good thing, preventing services from successfully starting when DNS would never resolve.Of the two, I wonder if we should favour Eager resolution, and consider changing the default to this.
However, we might want a third option: Active lazy resolution (for want of a better name). Lazy resolution is currently passive, as in it happens in the background on a schedule. It is never actively called in the request flow, which is why it's hard to put a timeout around. Instead, could we implement something which actively calls probe_once()
(with a timeout!) as part of the first request (or alternatively when GrpcServiceProbe.endpoints
is empty)? This could give us lazy DNS resolution, but with timeouts.
Scratch that, I took a different approach: tower-rs/tower#715. EDIT: Nope, that hasn't worked out. Back to the drawing board.
tonic's Endpoint type has many configurable parameters, for example keep_alive_interval
ginepro internally constructs Endpoint instances from socket addresses, and then applies some limited configuration values (like tls and timeout), but otherwise most settings remain the default.
It's probably not practical or ergonomic to specify every configuration value in ginepro's API; however it would be useful if the library could accept something like a Fn(SocketAddr) -> Result<Endpoint, SomeError>
so that the user could configure an endpoint while the library handles stuff like periodic dns lookups
If we registered a custom lookup service that returns ipv6 addresses, grpc client request would block forever. After tracking source code, I guess the bug exists in build_endpoint
method:
fn build_endpoint(&self, ip_address: &SocketAddr) -> Option<Endpoint> {
let uri = format!(
"{}://{}:{}",
self.scheme,
ip_address.ip(),
ip_address.port()
);
// ...
}
If the type of ip_address
is SocketAddr::V6, the correct patten should be {}://[{}]:{}
instead of {}://{}:{}
which would fail the endpoint building. Then the create_changeset
method would always report nothing because build_endpoint returns None
:
changeset.extend(
add_set
.into_iter()
.filter_map(|addr| self.build_endpoint(&addr).map(|endpoint| (addr, endpoint)))
.map(|(addr, endpoint)| Change::Insert(addr, endpoint)),
);
Implement a custom lookup service that returns ipv6 addresses and register it.
Custom lookup service returns either ipv4 or ipv6 addresses should work correctly.
Environment independent
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.