GithubHelp home page GithubHelp logo

zerofox-oss / go-aws-msg Goto Github PK

View Code? Open in Web Editor NEW
24.0 24.0 16.0 337 KB

AWS Pub/Sub Primitives for Go

License: BSD 3-Clause Clear License

Makefile 0.83% Go 99.17%
aws go golang pubsub sns sqs

go-aws-msg's People

Contributors

elmarcoh avatar elyunge avatar rynmrtn avatar shezadkhan137 avatar tmessi avatar xopherus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-aws-msg's Issues

Telemetry!

Verifiability

Are there any easy wins to increase operation visibility and allow clients to easily determine if go-aws-msg is providing its intended value?? Could defining go-aws-msg mission help to guide telemetry choices??? (IMO thinking like efficiently consume, process and fin a bounded number of SQS messages but really don't know :p) Assuming something like this is the case, which metrics would provide visibility into if go-aws-msg is fulfilling its purpose?

Thinking maybe:

  • latency - how long a message takes to be handled (histogram: median, 95%, 99%, 100%)
  • traffic / throughput - how many messages / interval are behing handled
  • errors - message handle results (good / bad)
  • saturation - when is work waiting on a resource that isn't available?? (looks like it's just handler concurrency? that could become saturated?)
    (these also just so happen to be google SRE's "Four Golden Signals" :p )

How would these metrics be gathered, where do they come from? who is responsible for gathering them and reporting them??

Why Telemetry?

Exposing telemetry should allow for easy visibility into the operation of go-aws-msg. Emitting telemetry instead of relying on centralized SQS metrics provides flexibilty and allow easier verification of server processes, it also allows for insights into server performance locally, and allows for metrics which may not be provide out of the box by aws cloudwatch. Having actionable application emitted metrics will pave the way for monitoring/alerting, ie if n errors occur in an interval, or if n percentages of message Recievers result in errors? or if server is spending n seconds waiting need to be alerted.

Also having a flexible interface for emitting these types of metrics is essential for performance testing and capacity planning. How many instances of a server are necessary to handle 1000 messages / minute ?? 10000 / minute ? 100000 etc? Having telemetry would help with local tuning to determine what sorts of concurrency levels would be necessary.

Emitting telemtry should help provide context around issues when they occur.

What?

Latency

  • receive - histogram of time spent receiving in the client code
  • message processing - histogram of total time spent processing message in server implementation from start to ack

Traffic / throughput

Errors

Saturation

Who?

Should be responsible for making these calls does aws-go have any obligations to provide server level metrics so clients can intro spec? Should server provide hooks for telemetry and leave it up for clients? Should clients be 100% responsible for first iteration? If clients are 100% responsible they can't get to the saturation time spent blocking and the SQS recieve errors from inside their Receive method

Where?

Should these calls take place??? Should the server have some sort of metadata structure and expose a couple of public methods to access it? Could there be some sort of interface and hooks to allow client to configure which type of metric implementation they would like (ie logging, statsd, promethues, new relic, etc)?

type Telemetry interface {
   Timing(METRIC_NAME string, VALUE float64, RATE float64, TAGS []string)
   Increment(METRIC_NAME string, VALUE float64, RATE float64, TAGS []string)
}

Maybe offering a logging based telemetry implementation by default? but allowing clients to configure their own?

https://github.com/zerofox-oss/go-aws-msg/blob/master/sqs/server.go#L33

type Server struct {
  ...
  telemetry Telemetry
}

Resulting in calls like:

start := time.Now()
// Take a slot from the buffered channel
s.maxConcurrentReceives <- struct{}{}
t := time.Now()
elapsed := t.Sub(start)
s.telemetry.Timing("receive_saturation_wait", elapsed.Seconds(), ..., ...)

Rate-limiting

For some microservices it might be beneficial to rate-limit the number of active goroutines based on time. For example, if I have a Server which interacts with a third-party API and that is limited to 10 calls/sec, I would only want to serve 10 messages/second (assuming each message uses 1 API call).

With the current primitives, this rate limiting would have to be done at the Receiver level, which would introduce blocking by means of mutexes or channels - that's probably not the most efficient. We should consider adding this capability to the Server so we can limit the number of active Receivers based on time.

Allow STS Federation Tokens

We'd like to be able to use STS tokens to grant us access to SNS/SQS resources. These are short-lived credentials that last for at most 36h (if using Federation tokens). In order to do that we need to allow AWS_SESSION_TOKEN to be provided.

Better tests

After spending some time away from the initial implementation of the mocks, I think that it's time for them to be revisited. I don't like that I'm basically defining my own behavior for SQS.

Ideally we could run these tests against an SQS docker image...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.