raystack / raccoon Goto Github PK

Raccoon is a high-throughput, low-latency service to collect events in real-time from your web, mobile apps, and services using multiple network protocols.

Home Page: https://raystack.github.io/raccoon/

License: Apache License 2.0

Dockerfile 0.13% Makefile 0.72% Go 48.03% Java 12.16% JavaScript 38.95%

clickstream dataops eventsourcing kafka

raccoon's People

Contributors

Stargazers

Watchers

raccoon's Issues

Support for other HTTP based protocols like GRPC

Problem

Currently, Raccoon only supports Data ingestion using WebSockets and protobufs as the only supported data serializing format. The idea is to allow support for other protocols like GRPC, HTTP/1.l1(REST) with other data serialization formats like JSON.

Is there any workaround?
NA

What is the impact?

Upstream services can ingest data into Raccoon using various transport protocols which makes it easier to adopt.

Which version was this found?

Solution

Use the existing server that exposes WebSocket endpoint and add an additional support for POST Method to allow HTTP/1.1 support. We can use the same API to support various serialization formats like JSON/protobufs based on Content-Type header.
GRPC can be served by another server on a different port.

Event type should be added to metrics to enable dashboards group by the "event type"

Problem
Currently the metric dashboards provides useful information to events throughput does not however let see the throughput per the type set in the Event proto.
Having this tag over the metric will help slice the throughput for each type. Other metrics where relevant could add type as the tag to statsd.

Is there any workaround?
NA

What is the impact?
NA

Which version was this found?
All versions

Solution
metrics to add additional type tag to the existing metric
2 metrics such as: kafka total messages delivered and events lost should be able produce aggregation based on the type

Incorrect max connection error code and empty reason is reported to the client

Description
Incorrect max connection error code and empty reason is reported to the client

To Reproduce
Step 1: Set the env variable SERVER_WEBSOCKET_MAX_CONN = 1
Step 2: Create the multiple connections.

Expected vs actual behavior
expected: Code_CODE_MAX_CONNECTION_LIMIT_REACHED
actual: Code_CODE_MAX_USER_LIMIT_REACHED

Expected behavior
When the max connection threshold is reached the client should received the Code_CODE_MAX_USER_LIMIT_REACHED error code and the reason message instead of empty

Documentation for protocol agnostic Raccoon

Problem
New features have been added to Raccoon that allows clients to send data using HTTP/gRPC along with the support for multiple data formatting options like JSON. It has also resulted in a new code design which is also missing in the existing documentation.

Is there any workaround?
NA

What is the impact?
NA

Which version was this found?
NA

Solution
Add the missing documentation or change the existing one where required.

Add GCP PubSub sink

Move away from yaml file for application config and use envs

Instead of config yaml. Sample .env.sample and .env.test can be used to load environment variables for all purposes. The motivation and pros of this approach are listed here. https://12factor.net/config

Raccoon Client - Go

Summary
This reflects on how the raccoon-go-client can be used to communicate with the raccoon service using the different communication protocols.

Proposed solution

The idea is to keep the initialisations of the WebSocket, HTTP, and GRPC clients distinct so that users can select the one they are interested in using.

Message Encoding:
The client will take care of the serialization/deserialization of the JSON and PROTO messages. The user does not have to worry about it; the client will handle it.

Observability:

Client Stats using StatsD:
The client is also intending to provide the stats for API calls. The following stats are planned for export to the user.

sent_tt_ms
ack_tt_ms
total_bytes_sent
total_bytes_recieved
total_conn_err

Logging:

The client provides the interface and the default console logging that the user can use, disable, or provide its own implementation.

Client Info (version.go):

The client will emit the following information about itself, which will be provided during the go build command using the “-ldflags”
Name
Version
BuildDate

Request Guid:

The client will auto generate the request_guid. The client will also provide the provision to pass the request_guid.

Serialization:

The user can pass the array of high level proto/json etc. to the send API, and the client will internally serialize the user data and wrap it for the raccoon request.
The client comes with Json/Proto serializers built in, and even users can configure any serialization.

Websocket

Event Ack:

In the case of WebSocket, the client will also provide the async delivery channel to which the user can subscribe to receive the event acknowledgements.

Ping/Pong -:

The client will internally handle the ping/pong communication with the raccoon, will provide the setting for interval time.

Retry

The user can configure the retry options on which the client will retry based on the configuration provided.

This proposal gives the user the flexibility to use any serialization for converting the event bytes before sending them to the client, default will be proto.

The client will ship the Json/Proto serializer as part of the package.

   // Request message
   type Event struct {
       Type string
       Data interface{}
   }
 
   // Response message
   type Response struct {
       Status   string
       SentTime int64
       Reason   string
       Data     map[string]string
   }
 
   type Client interface {
       Send([]*Event) (string, *Response, error)
       SetUserClientInfo(*App)
   }
 
   // interfaces
   type WebSocket interface {
       Client
       // WebSocket specific methods
   }
 
   type Grpc interface { Client }
   type Rest interface { Client }
   type RestClient struct { Opts Options }
 
   type ClientOption interface {
       Apply(*Options)
   }
 
   type Options struct {
       Ser Serializer
   }
 
   func (s *RestClient) Apply(op *Options) {
       s.serialize = op.Ser
   }
 
  func WithSerializer(ser Serializer) ClientOption {
       return &RestClient{ serialize: ser }
   }
 
   
   // NewRest creates the rest client.
   func NewRest(url string, opts ...ClientOption) *RestClient {
       var o Options
       for _, opt := range opts {
           opt.Apply(&o)
       }
       return &RestClient{
           Serialize: o.Serializer,
       }
   }
 
   // Serializer
   type Serializer func(interface{}) ([]byte, error)
 
   // set the app info
   func (*RestClient) SetAppInfo(app *App) {}
 
   // Send sends the events to the raccoon service
   func (c *RestClient) Send(events []*Event) (string, *Response, error) {
       e := []*pb.Event{}
       for _, ev := range events {
           // serialize the bytes based on the config
           b, _ := c.serialize(ev.Data)
           e = append(e, &pb.Event{
               EventBytes: b,
               Type:       ev.Type,
           })
       }
       reqId := uuid.NewString()
       raccoonReq := &pb.SendEventRequest{
           ReqGuid: reqId,
           Events:  e,
       }
       // log & Send racoonReq to raccoon service
       log.Print(raccoonReq)
 
       return reqId, &Response{}, nil
   }
 
   // Client usage example
   func ClientExample() {
 
       JSON := func(i interface{}) ([]byte, error) {
           return json.Marshal(i)
       }
 
       rest := NewRest("http://localhost:8080",
           WithSerializer(JSON),
       )
 
       rest.SetAppInfo(&App{
           Id:      "goplay-123",
           Name:    "goplay-backend",
           Version: "1.0",
       })
 
       reqId, resp, err := rest.Send([]*Event{
           &Event{
               Type: "page",
               Data: &struct {
                   name string
               }{"page-name"},
           },
       })
 
       if err != nil {
           log.Printf("%v, %s", err, reqId)
       }
 
       log.Print(resp.Status)
   }

Websocket Checkorigin is wrongly implemented

Problem

SERVER_WEBSOCKET_CHECK_ORIGIN should not check origin when set to False. Should check origin and reject if violating CORS when set to true.
Currently, it rejects every connection when set to false. And accept every connection when set to true.

Root Cause

In this line , the checkOrigin function is overridden and return true or false depending on the SERVER_WEBSOCKET_CHECK_ORIGIN value.

Solution

The toggle should be use to toggle checkOrigin function. False value should map to custom CheckOrigin function which always return true. True value should map to nil CheckOrigin function which will lead to use default checkOrigin function.

Local setup not straightforward

Problem

Local setup of raccoon is not too straightforward. For someone trying to setup raccoon locally either they have to take care of the dependencies like Kafka, telegraf etc or need to do multiple things to make it work.

Is there any workaround?
NA

What is the impact?
NA

Which version was this found?
All versions

Solution
Update docker-compose.yml file to make setup in a single command

Allow server acknowledgements after events are published

Problem Summary

Currently, Raccoon sends acknowledgments to the clients after pushing events to BufferChannel and not when published to Kafka. As a result, clients can't retry or resend events in case of downtimes or producer failures.

Proposed solution

Add a configuration parameter EVENT_ACK that allows Raccoon to run in different states. The following states are proposed -

0 - events are acknowledged after pushing to BufferChannel
1 - events are acknowledged after publishing to Kafka

Impact

Clients are aware of publishing failures and can retry publishing of events.

Which version was this found?
NA

Additional context

Increased end-to-end latency in case of EVENT_ACK = 1 as acknowledgments are send after dequeuing from the buffer channel and publishing to Kafka

Add AWS Kinesis sink

Perf(Websockets): Changes to improve WebSockets Performance

Problem
It has been observed that Websocket performance has degraded from previous versions

Is there any workaround?
N/A

What is the impact?
Performance issues can create bottlenecks if throughput is high.

Solution
Code changes to improve performance

support connection type as unique identifier along with id

Problem
Raccoon supports unique websocket connections per user by specifying SERVER_WEBSOCKET_CONN_UNIQ_ID_HEADER configuration. There is a requirement to support connection types that is part of another header. Connection types combined with id will be used to maintain unique connections instead.

Which version was this found?
v0.1.0

Is there any workaround?
An alternative approach is to make user id globally unique.

What is the impact?
The current behavior is still supported. However, if SERVER_WEBSOCKET_CONN_TYPE_HEADER config is provided, Raccoon will use the header value together with id for uniqueness.

Solution

Add another configuration to specify connection type header key and use that connection type header value as connection identifier.
Add conn_type tags to each sensible metrics.

performance degradation because of additional data in collector

Problem

In Raccoon following buffered channel is used to collect events coming from clients.

bufferChannel := make(chan collection.CollectRequest, config.Worker.ChannelSize)

CollectRequest struct is defined as follows-

type CollectRequest struct {
	ConnectionIdentifier identification.Identifier
	TimeConsumed         time.Time
	TimePushed           time.Time
	*pb.SendEventRequest
}

SendEventRequest is part of auto generated go code from proto. The autogenerated go code has a lot of extra data that is passed in the channel which also includes objects of type sync.Mutex and unsafe.Pointer

After changing the function definition of ProduceBulk method for publisher.KafkaProducer from

ProduceBulk(events []*pb.Event, deliveryChannel chan kafka.Event) error

ProduceBulk(request collection.CollectRequest, deliveryChannel chan kafka.Event)

There was an increase in event_processing_duration_milliseconds

What is the impact?

Intermittent latency spikes.

Which version was this found?**

Issue was observed in v0.1.3

Solution

Change definition for CollectRequest to just pass the data required by the worker to process the event like EventBytes, Type and SentTime.

Make raccoon imports standardized

Problem

Raccoon's module name currently is raccoon instead of github.com/odpf/raccoon which is not according to standard go module names. As a result, all the packages are imported as raccoon/config, raccoon/metrics, or raccoon/server etc.

What is the impact?
Raccoon packages cannot be imported into any other package.

Which version was this found?
NA

Solution
change module name to github.com/odpf/raccoon and change all the internal imports likewise.

Raccoon needs to add ingestion time to every event

Problem

We (GoJek) use Raccoon currently to source clickstream events from the gojek app. The concrete product proto contains an event_timestamp field which the downstream systems such as DWH can use to partition the data on. However we see some amount of data arrives in partitions in future dates while some other arrive at different days for the same event timestamp date. There are 2 scenarios that causes this issue:

The time/clock in the mobile app is reset by the user to a future date
The app was inactive and those events were sent at a later point of time by the mobile sdk

Is there any workaround?
The DWH can partition based on a field which is like an ingestion time into the warehouse. However this needs backfills & repartitions on existing data and the upstream applications may need to change the way they query.

What is the impact?
Upstream applications' & services' query returns erroneous results

Which version was this found?
NA

Solution
Raccoon needs to provide an ingestion time for each event. The ingestion time should be considered as the time it was ingested into raccoon. This enables DWH to partition data based on the ingestion time as an alternate option to event_timestamp.

Add benchmark doumentation

Problem

As Raccoon is a high throughput application, users don't have visibility on the throughput levels.

Is there any workaround?
NA

What is the impact?
NA

Which version was this found?
All versions

Solution

Add benchmarking documentation.

Inconsistent JSON <> Protobuf API standard

Bug

json message isn't serialised with correct Protobuf based JSON encoding standard.
Resulting in failure when sending JSON requests from valid protobuf encoded json string.

Context:

Protobuf style guide states that JSON keys should be in camelCase, and protobuf keys/field names should be in snake_case.
The standard for encoding Timestamp is to convert it into a string of RFC 3339.
Thus, when a Request Payload is serialised. It uses CamelCase and also converts the timestamp/sent_time into a string.
However, since raccoon uses standard encoding/json package to deserialise, it does not correctly deserialise camelCase keys of Json.
It also fails to deserialise the date string.

Fix

Start using protobuf's official encoding/protojson package for deserialisation.
It adhere's to the style guide of protobuf, which supports deserialisation of both snake_case and camelCase keys in JSON.
However, existing JSON contract will break since new json contract will expect sent_time to be of type string instead of an object {seconds: number, nanos: number}. This could be fixed by updating existing clients to use protobuf's json encoders instead of language's default json encoders.

Improve the ack changes for synchronous

Summary
Due to the current ack changes, connections are being disconnected as soon as the server read deadline is reached because they are unable to read the next message until the current message answer is acknowledged.

Proposed solution
The solution is to create the ack go-routine to handle the response messages for the connection
Once the message is received, it will put into the acknowledge channel, and then the response is returned once the message published to the kafka is successful.

And the next message of the current connections will not be blocked it will be keep on reading.
And there will be a single go-routine for all the ack message handling.

Refactor to simplify http package

Problem

HTTP package is where we put http based servers and handlers. Currently, the entry point of this package (server.go) has all the 3 servers initialization logic. This introduces complexity to the initialization logic of the servers. Dependencies specifically belonging to each server are leaked here potentially making it more difficult to change in the future.
For example, table and ping are specific to websocket. Which introduces complexity.

Is there any workaround?
N/A

What is the impact?
Complexity grows as we add more protocol

Solution
Make the initialization logic independent of each other.

Add support for HTTP

Raccoon only supports data ingestion through WebSockets. It can be extended to support HTTP interface as well. Raccoon should provide an interface to extend its APIs to any protocol.

One possible suggestion is:
api/ http websocket grpc ...

Disk-backed Persistent queue for channels in Raccoon

Summary
Currently, Raccoon uses Channels for the intermediate processing of EventRequests, which are then forwarded to the message broker. But it does not solve the problem of loss of events that are there in the channel and could not get forwarded to Kafka when the server dies.

Proposed solution
Implement some disk-backed queueing for intermediate persistence, similar to this project.
https://github.com/jhunters/bigqueue

Buf lint changes

Problem

buf lint was breaking raccoon protos, so changes(PR #126) were done in proton repo to fix those linting issues. Those changes will break raccoon code base.

Is there any workaround?
NA

What is the impact?
NA

Which version was this found?
All versions

Solution
Update raccoon code base wr.t. changes in raccoon proto.

raystack / raccoon Goto Github PK

raccoon's People

Contributors

Stargazers

Watchers

Forkers

raccoon's Issues

Problem

Root Cause

Solution

Problem

What is the impact?

Which version was this found?**

Solution

Bug

Context:

Fix

Problem

Recommend Projects

Recommend Topics

Recommend Org

Jobs