GithubHelp home page GithubHelp logo

raystack / siren Goto Github PK

View Code? Open in Web Editor NEW
77.0 77.0 12.0 7.2 MB

Siren provides an easy-to-use universal alert, notification, channels management framework for the entire observability infrastructure.

Home Page: https://odpf.github.io/siren/

License: Apache License 2.0

Dockerfile 0.01% Makefile 0.32% Go 99.67%
alerting dataops influx monitoring prometheus

siren's Introduction

Siren

test workflow License Version Coverage Status

Siren provides alerting on metrics of your applications using Cortex metrics in a simple DIY configuration. With Siren, you can define templates(using go templates standard), and create/edit/enable/disable prometheus rules on demand. It also gives flexibility to manage bulk of rules via YAML files. Siren can be integrated with any client such as CI/CD pipelines, Self-Serve UI, microservices etc.

Key Features

  • Rule Templates: Siren provides a way to define templates over alerting rule which can be reused to create multiple instances of the same rule with configurable thresholds.
  • Subscriptions: Siren can be used to subscribe to notifications (with desired matching conditions) via the channel of your choice.
  • Multi-tenancy: Rules created with Siren are by default multi-tenancy aware.
  • DIY Interface: Siren can be used to easily create/edit alerting rules. It also provides soft-delete (disable) so that you can preserve thresholds in case you need to reuse the same alert.
  • Managing bulk rules: Siren enables users to manage bulk alerting rules using YAML files in specified format with simple CLI.
  • Receivers: Siren can be used to send out notifications to several channels (slack, pagerduty, email etc).
  • Alert History: Siren can store alerts triggered by monitoring & alerting provider e.g. Cortex Alertmanager, which can be used for audit purposes. To know more, follow the detailed documentation

Usage

Explore the following resources to get started with Siren:

  • Guides provides guidance on usage.
  • Concepts describes all important Siren concepts including system architecture.
  • Reference contains the details about configurations and other aspects of Siren.
  • Contribute contains resources for anyone who wants to contribute to Siren.

Run with Kubernetes

  • Create a siren deployment using the helm chart available here

Running locally

Siren requires the following dependencies:

  • Docker
  • Golang (version 1.18 or above)
  • Git

Run the application dependencies using Docker:

$ docker-compose up

Update the configs(db credentials etc.) as per your dev machine and docker configs.

Run the following commands to compile from source

$ git clone [email protected]:odpf/siren.git
$ cd siren
$ go build main.go

Running tests

# To run tests locally
$ make test

# To run tests locally with coverage
$ make test-coverage

Generate Server Configuration

# To generate server configuration
$ go run main.go server init

This will generate a file ./config.yaml.

Running Server

# To run server locally
$ go run main.go server start

To view swagger docs of HTTP APIs visit /documentation route on the server. e.g. http://localhost:3000/documentation

Contribute

Development of Siren happens in the open on GitHub, and we are grateful to the community for contributing bugfixes and improvements. Read below to learn how you can take part in improving Siren.

Read our contributing guide to learn about our development process, how to propose bugfixes and improvements, and how to build and test your changes to Siren.

To help you get your feet wet and get you familiar with our contribution process, we have a list of good first issues that contain bugs which have a relatively limited scope. This is a great place to get started.

This project exists thanks to all the contributors.

License

Siren is Apache 2.0 licensed.

siren's People

Contributors

akarshsatija avatar kevinbheda avatar kevinbhedag avatar mabdh avatar pyadav avatar rahmatrhd avatar ravisuhag avatar rohilsurana avatar scortier avatar whoabhisheksah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

siren's Issues

Move away from deprecated package "github.com/pkg/errors"

Is your feature request related to a problem? Please describe.

  • Package github.com/pkg/errors has already been archived, we need to move to the pure golang error.
  • Error is not structured and properly typed, there are a lot of string matched error in handler we need to handle error better.

Describe the solution you'd like

  • Move to the pure golang error.
  • Implement error like entropy error

Add subscription client CLI

Is your feature request related to a problem? Please describe.
Subscription does not have client cli yet, we need to add that.

Describe the solution you'd like

Add these 4 commands

$ siren subscription create --file input.yaml
$ siren subscription edit --id 1 --file input.yaml
$ siren subscription delete 1
$ siren subscription view 1

Update Siren Documentation

  • Add a glossary section to introduce all key terms.
  • Update concept section to explain all keys terms in details.
  • Update guide flow wth an example journey. Check buf guide for reference.
  • Complete reference to other things should move to references. Check Buf for reference.
  • Update guides to accommodate CLI and API using docusaurus tabs.
  • Create a full CLI reference and move it to reference section.
  • Create a full API reference and move it to reference section.
  • Remove empty pages. Merge monitoring, deployment and troubleshooting in deployment guide only.

Support slack notification

We need to provide service endpoints to send notifications on slack

  • Add siren API to exchange a code for access_token for a given workspace and store the token in DB.
  • Add siren API to send messages. (Scope: Private channel, public channel, DM)
  • Add siren API to list all channels accessible to the slack app installed.
  • Update Siren Slack Configuration API as per new configuration of alert manager(Slack http_config).

Refactor gRPC API

Is your feature request related to a problem? Please describe.
There are couple of possible improvements in proton. See this issue.

Describe the solution you'd like
Refactor gRPC API to follow issue raystack/proton#143

Increase group_interval config for alertmanager to 30 min

Is your feature request related to a problem? Please describe.
Having a 5min of group_interval causes you frequent alerts in the notification channels.

Describe the solution you'd like
Decrease the frequency on which alerts are triggered.

Describe alternatives you've considered
No suitable alternatives found

Right now, the group_interval config set by Siren in alert manager is hardcoded to 5min, which is too low, causing lots of notifications. We should make it 30min.

Implement Postgres Queue between notification dispatcher and handler

Is your feature request related to a problem? Please describe.
Based on this PRD and this RFC. We plan to move responsibility of sending notification to siren from provider. One component that we need to develop is the queue between dispatcher and handler.

Describe the solution you'd like

  • Integrate Queue with notification
  • Use Postgres Skip Locked
  • Adjust some changes in the interface or struct if needed

New Postgres Table

CREATE TABLE IF NOT EXISTS notification_messages
(
   id               bigserial NOT NULL PRIMARY KEY,
   status           text NOT NULL, -- ENQUEUED/RUNNING/FAILED/DONE
   try_count        integer,
   max_tries        integer,
   last_error       text,

   receiver_type    text NOT NULL,
   receiver_configs jsonb,
   details          jsonb,

   expired_at       timestamptz,
   created_at       timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP,
   updated_at       timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP,
);
  • status should be ENQUEUED, RUNNING, FAILED, DONE
  • try_count is the number of trial of a message is getting to be published
  • max_tries is the maximum number of trial that is allowed
  • last_error captures the last error if message publishing is failing
  • receiver_type is type of receiver vendor e.g. slack, pagerduty, http
  • receiver_configs is config specific to receiver type for publishing message
  • details is the payload message that will be published
  • expired_at is the time threshold until the message should not be published anymore (NULL value means no expiry)

Get N enqueued messages to be published

UPDATE queue
SET status = 'RUNNING', updated_at = now()
WHERE id IN (
    SELECT *
    FROM notification_messages
    WHERE status = 'ENQUEUED' AND expired_at < now() AND try_count <= max_tries
    ORDER BY expired_at
    FOR UPDATE SKIP LOCKED
    LIMIT N
)
RETURNING *

Retry failed messages to be published

UPDATE queue
SET status = 'RUNNING', updated_at = now()
WHERE id IN (
    SELECT *
    FROM notification_messages
    WHERE status = 'FAILED' AND expired_at < now() AND try_count <= max_tries
    ORDER BY expired_at
    FOR UPDATE SKIP LOCKED
    LIMIT N
)
RETURNING *

Updating status when succeed

UPDATE queue
SET status = 'DONE', updated_at = now(), try_count = try_count + 1
WHERE id = message_id

Updating status when failed

UPDATE queue
SET status = 'FAILED', updated_at = now(), try_count = try_count + 1, last_error = err
WHERE id = message_id

TODO

  • Worker to clean up published messages/stale messages
  • Postgres autovacuum

Define Alertmanager receivers in a sorted manner

Siren creates alertmanager config and sycns with alertmanger. The alert manager config can change for the same subscriptions if their order changes. We should follow some sorting conventions and stick to those conventions to create an alert manager config.

Notification Service Dispatcher

Is your feature request related to a problem? Please describe.
Based on this PRD and this RFC. We plan to move responsibility of sending notification to siren from provider. One component that we need to develop is the notification dispatcher.
Notification dispatcher responsible to

  • Consume Notification
  • Label matching
    • Match notification to the subscription by labels and fetch the receivers
  • For each receiver, build notification message and send the message to Notification Handler

Describe the solution you'd like
We can implement 3 responsibilities mentioned above

Consume Notification

The source of notification can be multiple ways, manually triggered via API & via provider (cortex webhook). We need to define a Notification model and all sources need to transform their models to the Notification model. For manually triggered API, the model could be just the Notification model itself. However for provider webhook, we need a way to transform it to our Notification model. The Notification Model would be like this.

type Notification struct {
	ID                      string
	Data          map[string]string // any metadata or additional information that can be used by the handler to render template
	Labels               map[string]string // labels to match
	ExpiryDuration string // golang time duration unit โ€œnsโ€, โ€œusโ€ (or โ€œยตsโ€), โ€œmsโ€, โ€œsโ€, โ€œmโ€, โ€œhโ€.
	CreatedAt         time.Time
}

Meanwhile, cortex alertmanager will send this data to the webhook:

{
  "version": "4",
  "groupKey": <string>,              // key identifying the group of alerts (e.g. to deduplicate)
  "truncatedAlerts": <int>,          // how many alerts have been truncated due to "max_alerts"
  "status": "<resolved|firing>",
  "receiver": <string>,
  "groupLabels": <object>,
  "commonLabels": <object>,
  "commonAnnotations": <object>,
  "externalURL": <string>,           // backlink to the Alertmanager.
  "alerts": [
    {
      "status": "<resolved|firing>",
      "labels": <object>,
      "annotations": <object>,
      "startsAt": "<rfc3339>",
      "endsAt": "<rfc3339>",
      "generatorURL": [<string>](https://prometheus.io/docs/alerting/latest/configuration/#string),      // identifies the entity that caused the alert
      "fingerprint": [<string>](https://prometheus.io/docs/alerting/latest/configuration/#string)        // fingerprint to identify the alert
    },
    ...
  ]
}

For now, we could just transform each alert and ignore the grouped alert one. So the alert from webhook config will be transformed to the Notification Model like this.

type Notification struct {
	ID                      uint64
	Data          map[string]string // ["status"] = <resolved|firing>, ["generatorURL"] = <string>, ["fingerprint"] =  <string>, and all annotations
	Labels               map[string]string // labels
	ExpiryDuration string // default = no expiry
	CreatedAt         time.Time // time.Now
}

Label matching

  • Given labels from Notification,
  • Get all subscriptions that kv-pair values in match label is a subset of labels in Notifications
    • This could use jsonb <@ operator in postgres
  • For all subscriptions, get all receivers,
  • For all receivers, build a notification messages and push the messages to notification handler

This means the less number of label that a subscription has, the more likely it is to be matched to the incoming notification.

E.g.
Given a subscription labels

service: a
team: b
env: integration

this will holds true if notification has labels

service: a
team: b
env: integration
some_label: label
some_label2: label

Given a subscription labels

service: a

this will holds true if notification has labels

service: a
team: b
env: integration
some_label: label
some_label2: label

Given a subscription labels

service: a
team: b
env: integration
some_label: label
some_label2: label

this will holds false if notification has labels

service: a

For each receiver, build notification message and send the message to Notification Handler

For each receiver, a notification message will be created based on a Notification model. Note: this will be a temporary model and we can enhance it later once we develop a message queue to handle notification.

type Message struct {
	ID     string
	Status MessageStatus  // ENQUEUED/FAILED/RUNNING/DONE

	ReceiverType string
	Configs      map[string]interface{} // the datasource to build vendor-specific configs, for each receiver type, there is a contract of its config, this field will be populated based on receiver config stored in the DB
	Detail       map[string]interface{} // the datasource to build vendor-specific message, this will be populated by `Notification.Data` and `Notification.Labels`, should there be any key conflict from both, `Notification.Data` will take precedence
	LastError    string

	MaxTries  int
	TryCount  int
	Retryable bool

	ExpiredAt time.Time // Notification.CreatedAt + ExpiryDuration
	CreatedAt time.Time
	UpdatedAt time.Time

	expiryDuration time.Duration
}

We could define a prototype of Notification Handler without queue that does console print.

Removing ORM and moving from Gorm to vanilla sql

Summary
For some basic CRUD, ORM is helpful and help us faster in development. However, in our current state of Guardian, there are some cases when using gorm as an ORM will overcomplicate the query since we need more advanced query.

Proposed solution
We can remove the ORM to get more flexibility to build a query and use query builder (goqu, squirrel, etc) instead to help it.

For db package, we can use the one in https://github.com/odpf/salt

Use `odpf/salt/log` for logger

Is your feature request related to a problem? Please describe.
We need to use odpf/salt/log for logger as a standard across odpf project.

Describe the solution you'd like
Use odpf/salt/log for logger

End-to-end test of notification service & updating the subscription flow

Is your feature request related to a problem? Please describe.
Notification service has been implemented but the subscription and alert flow was still not well adjusted. This means, all subscriptions will be synced to the provider (cortex) although that subscription is not an alert subscription.

Describe the solution you'd like

  • Deprecate subscription synchronization from cortex and only keep alert history webhook route in provider (alertmanager cortex).
  • Add end-to-end test for notification service, from subscription to notification handling.

Add new release `dev` tag for development

Is your feature request related to a problem? Please describe.
The existing Siren releases images with only semver tag. This makes us less flexible to test development version during development. We need a dev tag that could be used for more flexible testing.

Describe the solution you'd like

  • Add a new workflow to build a siren docker image with dev tag
  • Make it trigger-able via branch
  • Push the image to odpf/siren docker hub

Notification Source Idempotency

Problem

Siren as a Notification Service requires idempotency for its Notification Sources to avoid duplicated notification that would lead to alert/notification fatigue for its users.

With an assumption that the network is always unreliable, retry is always needed to make sure a data is being passed or a function is being invoked. When sending notification, retry could cause a duplicated notification if not handled properly. Duplicated notifications are unnecessary to send and too many alerts/notifications could degrade the user experience and could lead to alert fatigue to its users.

Solution

oip3-2

Above is the flow of notification in Siren. Notification Source should have idempotent property to avoid duplicated notification. Notification has two possible sources, via manually triggered API and via provider alert webhook. Both sources need to transform its model into a Notification model. Idempotency could be handled at this point before notification is being dispatched.

Idempotency Key

From the diagram above, the point of failure when sending notifications that we consider for idempotency is only the Notification Dispatcher step. Once the notification message is already in a queue, it is safe to say the transaction is finished. We could define a new variable Idempotency-Key that could be used to define a single unique notification.

Idempotency-Key is a string with length max 100 chars with no whitespace. There is no restriction on what the format should be but we could prefer it to be in UUID format. The idempotency is unique for a user, this means two Notifications might have the same idempotency key but we could still distinguish it by user. However since Siren currenlty does not have any knowledge about user (no auth), we can put API for notification triggered by API and Cortex if notification is triggered by cortex webhook.

Table notification_idempotency_keys

To make sure the idempotency check could be done horizontally, we need to store idempotency key information in a single place. Therefore, there is a need to create a new notification_idempotency_keys table as well in Siren's postgres.

Field name Field type Properties
id BIGSERIAL PRIMARY KEY
user TEXT NOT NULL
idempotency_key TEXT NOT NULL CHECK (char_length(idempotency_key) <= 100)
created_at TIMESTAMPTZ NOT NULL DEFAULT now()

To make sure the idempotency key is unique-per-user we can create a UNIQUE INDEX

CREATE UNIQUE INDEX notification_idempotency_keys_user_idempotency_key ON notification_idempotency_keys (user, idempotency_key);

Idempotency Keys Time-To-Live (TTL)

Since the duplicated notifications tend to be sent multiple times in the relatively short duration (it is less likely for a notification to be retried/resend after a long duration), the stored idempotency keys would not be stored forever. There is a predefined global time-to-live configuration for all idempotency keys e.g. 24 hours. This is to make sure the idempotency keys could be reused again after the pre-configured time.

Since postgresql does not have a row-wise TTL feature, there is a need for a job to do this regularly. Although this seems like a problem that could be solved by Redis or other storage that supports TTL, we could start with utilizing what we have currently and could optimize and improve later.

Notify API Idempotency

Notify API will be used to manually triggered notification. We could support idempotency by supporting Idempotency-Key header with string value and users could generate idempotency key (preferred in UUID) by themselves.

Group cli commands with core and dev groups.

Problem?
Siren shows all commands as available commands right now. Which makes it hard to segregate the end-user commands and dev commands.

Solution?
We can use the salt cmdx package which allows us to groups commands in groups. There can be two groups core and dev.

Handle slack rate limiting

Slack has a rate limit of sending messages in a channel(1 message per channel per second). Siren should be enhanced to provide asynchronous slack message sending where it can process any number of messages and call Slack as per the limit via some queue or alike.

Updating receiver type can cause invalid subscription state

Each subscription has a set of receivers and configurations according to the type of receives.
e.g. A valid subscription.

    "id": "119",
    "urn": "test-subscription",
    "namespace": "10",
    "receivers": [
        {
            "id": "1",
            "configuration": {
                "channel_name": "general"
            }
        },
        {
            "id": "2"
        }
    ],
    "match": {
        "c": "d"
    },
}

This means the receiver with id: 1 is supposed to be of slack type. But someone can just update this Receiver to some other type(e.g. Pagerduty) with the id intact.
As a result, syncing this config to upstream(e.g. alertmanager will start failing).
We should avoid this accidental change, causing an invalid state.

Improve client and server cli

Is your feature request related to a problem? Please describe.
Update cli flow to follow odpf standard

Describe the solution you'd like

  • Load client config only for commands when needed
  • Try to make it as dry as possible
  • make overloading of config through flags generic.
  • Extract as much as possible in salt

Templatize receiver details in receiver in Subscription of Notification Service

Is your feature request related to a problem? Please describe.
Based on this PRD and this RFC. We plan to move responsibility of sending notification to siren from provider. There is a need to flexibly send custom notification messages to each receiver type.

Describe the solution you'd like

The Flow
Custom notification message could be implemented with a pre-defined template assigned to each receiver in the subscription flow. When subscribing a notification, one should pass this struct.

type Subscription struct {
	ID        uint64            `json:"id"`
	URN       string            `json:"urn"`
	Namespace uint64            `json:"namespace"`
	Receivers []Receiver        `json:"receivers"`
	Match     map[string]string `json:"match"`
	CreatedAt time.Time         `json:"created_at"`
	UpdatedAt time.Time         `json:"updated_at"`
}

Each receiver needs to be like this

type Receiver struct {
	ID            uint64            `json:"id"`
	Type          string            `json:"type"`
	Configuration map[string]string `json:"configuration"`
}

Depending on the receiver type, receiver configuration could have various field. A new template field could be added to define that this receiver of the subscription would use the template for the notification message.

The Template
The template could be created with the same way as user create template for the rules. Instead of having a type rule, template would have type notification. The content of the template should be compatible with the contract of receiver type payload. For example, this is how the slack notification template is.

apiVersion: v2
type: template
name: alert-slack-details
body:
  receiver_type: slack
  attachments:
    - text: '[[.text]]'
      icon_emoji: ':eagle:'
      link_names: false
      color: '[[.color]]'
      title: '[[.title]]'
      pretext: '[[.pretext]]'
      text: '[[.text]]'
      actions:
        - type: button
          text: 'Runbook :books:'
          url: '[[.runbook"]]'
        - type: button
          text: 'Dashboard :bar_chart:'
          url: '[[.dashboard"]]'
variables:
  - name: color
    type: string
    description: slack color
    default: #2eb886
  - name: text
    type: string
    default: This is an alert
  - name: title
    type: string
    default: Alert
  - name: pretext
    type: string
    description: Pre-text of slack alert
    default: Siren
  - name: runbook
    type: string
    description: url to runbook
    default: http://url
  - name: dashboard
    type: string
    description: url to dashboard
    default: http://url
tags:
  - slack

Template will be rendered when notification is being dispatched and before a notification message is generated. Therefore, notification message would contain the rendered version of the notification message. Other than that, variables are being populated based on labels that the notification has.

Improve logs for /oauth/slack/token api

Problem:
The logs for the /oauth/slack/token API is not clear for the following scenarios.

  • slack-credentials like client_id or client_secret is wrong
  • the redirect oauth code is wrong

Siren logs looks like below [VERY HARD TO DEBUG]
{"level":"error","ts":1635444416.7504508,"caller":"handlers/utils.go:30","msg":"handler","error":"failed to exchange code with slack OAuth server: slack oauth call failed","errorVerbose":"slack oauth call failed\ngithub.com/odpf/siren/pkg/codeexchange.(*SlackClient).Exchange\n\t/Users/dwarakeshvenkatasamy/workspace/siren/pkg/codeexchange/http.go:62\ngithub.com/odpf/siren/pkg/codeexchange.Service.Exchange\n\t/Users/dwarakeshvenkatasamy/workspace/siren/pkg/codeexchange/service.go:46\ngithub.com/odpf/siren/api/handlers.ExchangeCode.func1\n\t/Users/dwarakeshvenkatasamy/workspace/siren/api/handlers/codeexchange.go:20\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/purini-to/zapmw.Recoverer.func1.1\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/purini-to/[email protected]/recoverer.go:33\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/purini-to/zapmw.Request.func1.1\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/purini-to/[email protected]/request.go:37\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/purini-to/zapmw.WithZap.func1.1\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/purini-to/[email protected]/context.go:27\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/newrelic/go-agent/v3/integrations/nrgorilla.Middleware.func1.1\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/newrelic/go-agent/v3/integrations/[email protected]/nrgorilla.go:107\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2878\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1929\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\nfailed to exchange code with slack OAuth server\ngithub.com/odpf/siren/pkg/codeexchange.Service.Exchange\n\t/Users/dwarakeshvenkatasamy/workspace/siren/pkg/codeexchange/service.go:49\ngithub.com/odpf/siren/api/handlers.ExchangeCode.func1\n\t/Users/dwarakeshvenkatasamy/workspace/siren/api/handlers/codeexchange.go:20\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/purini-to/zapmw.Recoverer.func1.1\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/purini-to/[email protected]/recoverer.go:33\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/purini-to/zapmw.Request.func1.1\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/purini-to/[email protected]/request.go:37\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/purini-to/zapmw.WithZap.func1.1\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/purini-to/[email protected]/context.go:27\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/newrelic/go-agent/v3/integrations/nrgorilla.Middleware.func1.1\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/newrelic/go-agent/v3/integrations/[email protected]/nrgorilla.go:107\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2878\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1929\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581","stacktrace":"github.com/odpf/siren/api/handlers.internalServerError\n\t/Users/dwarakeshvenkatasamy/workspace/siren/api/handlers/utils.go:30\ngithub.com/odpf/siren/api/handlers.ExchangeCode.func1\n\t/Users/dwarakeshvenkatasamy/workspace/siren/api/handlers/codeexchange.go:23\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/purini-to/zapmw.Recoverer.func1.1\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/purini-to/[email protected]/recoverer.go:33\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/purini-to/zapmw.Request.func1.1\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/purini-to/[email protected]/request.go:37\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/purini-to/zapmw.WithZap.func1.1\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/purini-to/[email protected]/context.go:27\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/newrelic/go-agent/v3/integrations/nrgorilla.Middleware.func1.1\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/newrelic/go-agent/v3/integrations/[email protected]/nrgorilla.go:107\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2046\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\t/Users/dwarakeshvenkatasamy/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2878\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1929"}

Potential Solution:

In here https://github.com/odpf/siren/blob/main/pkg/codeexchange/http.go#L51
The unmarshal is causing the error to be lost as the schema for error will be like
{"ok": false, "error": "invalid_code"}

So print out the stringified version of bodyBytes if response is not ok

Laundry list of improvements

This is a running list of possible code improvements.

  1. File naming convention in api/handlers directory. Some files are singular names and some are plural names.
  2. Router is bloated with all endpoints, can we namespace the routes in separate router files and include all of them in a top-level/top-tier router?
  3. Moving request validation from handlers to domain Example Commit: 2ae3bce
  4. Add test coverage info in readme

Refactor mock files and codes

Is your feature request related to a problem? Please describe.
Siren has mock files generated with mockery. However there are some missing things that need to be added and updated:

  • Add make generate in Makefile to autogenerate mocks
  • Add go generate mockery annotation in each interface files to add ability to generate mock per interface
    • This is also to have more granular control to organize the generated mock files into a specific folder
  • Reorganize the mocks files to meet odpf standard

Some improvements to the code structure also needed

  • Highly coupled codes
  • Several global variables
  • Some variables name are same with package name, it will cause conflict in the future

Describe the solution you'd like

  • Add make generate in Makefile to autogenerate mocks.
  • Add go generate mockery annotation in each interface files to add ability to generate mock per interface. \
    • This is also to have more granular control to organize the generated mock files into a specific folder.
  • Reorganize the mocks files to meet odpf standard.
Generated mocks for dependency interfaces can be inside mocks/ package in the respective package. see [handbook](https://github.com/odpf/handbook).
  • There are also possibilities to refactor domain to its own package, similar case like what entropy does
  • Refactor the highly coupled codes 1, 2
  • Refactor several global variables 1, 2. Instantiation should only be done when initiating server, unless it is a singleton or lazy initialization

There will be 3 PRs for this Raising this with 1 PR instead

  • Refactor global variables so every initialization done in the server creation
  • Decouple highly coupled package & services
  • Refactor files in domain package to its respective domain and organize the mock files better

Remove logic from redirection server

As of now, the redirection server needs to fetch the code from query params(once the OAuth consent is given) and enrich add workspace in the payload to make a POST call to Siren for the code exchange.
Can we think of trying to remove this logic, and make the flow transparent, where the redirection server can just redirect the request further to Siren and get a response back without needing extra work of payload creation and enrichment.

Use postgres dockertest for testing postgres store layer

Is your feature request related to a problem? Please describe.
We can do integration testing directly for postgres store layer instead of using sqlmock. The test could use dockertest.

Describe the solution you'd like

  • Replace sqlmock in postgres store layer test with postgres dockertest

Notification Handler

Problem

Based on this PRD and this RFC. We plan to move responsibility of sending notification to siren from provider. One component that we need to develop is the notification handler.

Requirements

  • No queue between dispatcher and handler, just a simple method invocation.
  • Notification handler responsibility is to send notification messages out bond.
  • Notification vendors' contracts, It should have knowledge about all external notification vendors' contracts. Notification handler consumes notification message and transforms it to vendor-specific message. Need to have default message without template.
  • Retry logic, there is a need to have retry logic in notification handler (probably with exponential backoff) or a need to store dead messages in DLQ (will be addressed on separate issue) and retry them.
  • Message expiry, for each notification message, the validity would depend on the expired_at field. The empty or NULL expired_at field would indicate the message won't be expired. When the failed-to-send notification message is being retried, the notification messages that exceed the validity won't be retried. (will be addressed on separate issue)

Out of scope

  • DLQ
  • Retry job with message expiry check

Solution

Notification Message

type NotificationMessage struct {
	ID     string
	Status MessageStatus

	ReceiverType string
	Configs      map[string]interface{} // the datasource to build vendor-specific configs
	Detail       map[string]interface{} // the datasource to build vendor-specific message
	LastError    string

	MaxTries  int
	TryCount  int
	Retryable bool

	ExpiredAt time.Time
	CreatedAt time.Time
	UpdatedAt time.Time

	expiryDuration time.Duration
}

Notification vendors' contracts

Start with these receivers

  • slack
  • http
  • pagerduty
type SlackConfig struct {
	APIURL     stringL `yaml:"api_url,omitempty" json:"api_url,omitempty"`

	// Slack channel override, (like #other-channel or @username).
	Channel  string `yaml:"channel,omitempty" json:"channel,omitempty"`
	Username string `yaml:"username,omitempty" json:"username,omitempty"`
	Color    string `yaml:"color,omitempty" json:"color,omitempty"`

         // Slack specific contract
	Title       string         `yaml:"title,omitempty" json:"title,omitempty"`
	TitleLink   string         `yaml:"title_link,omitempty" json:"title_link,omitempty"`
	Pretext     string         `yaml:"pretext,omitempty" json:"pretext,omitempty"`
	Text        string         `yaml:"text,omitempty" json:"text,omitempty"`
	Fields      []*SlackField  `yaml:"fields,omitempty" json:"fields,omitempty"`
	ShortFields bool           `yaml:"short_fields" json:"short_fields,omitempty"`
	Footer      string         `yaml:"footer,omitempty" json:"footer,omitempty"`
	Fallback    string         `yaml:"fallback,omitempty" json:"fallback,omitempty"`
	CallbackID  string         `yaml:"callback_id,omitempty" json:"callback_id,omitempty"`
	IconEmoji   string         `yaml:"icon_emoji,omitempty" json:"icon_emoji,omitempty"`
	IconURL     string         `yaml:"icon_url,omitempty" json:"icon_url,omitempty"`
	ImageURL    string         `yaml:"image_url,omitempty" json:"image_url,omitempty"`
	ThumbURL    string         `yaml:"thumb_url,omitempty" json:"thumb_url,omitempty"`
	LinkNames   bool           `yaml:"link_names" json:"link_names,omitempty"`
	MrkdwnIn    []string       `yaml:"mrkdwn_in,omitempty" json:"mrkdwn_in,omitempty"`
	Actions     []*SlackAction `yaml:"actions,omitempty" json:"actions,omitempty"`
}

Handler interface

type Notifier interface {
    Notify(ctx context.Context, configs map[string]interface{}, details map[string]interface{}) error
}

Increase coverage to 95%

Is your feature request related to a problem? Please describe.
We have integrated siren with coverall and manage to reach > 80% coverage threshold. We could aim higher coverage to 95% for compass.

Describe the solution you'd like
Increase coverage to 95%

fix hostname validation error

Sample call

curl -X 'POST' \
  'http://localhost:3000/providers' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "host": "http://localhost:9009",
  "name": "localhost_cortex",
  "type": "cortex",
  "credentials": {},
  "labels": {}
}'

This is the error being received:

{
  "code": 3,
  "message": "invalid CreateProviderRequest.Host: value does not match regex pattern \"^[A-Za-z0-9_.-]+$\""
}

Implement grpc health server and wire it with `/ping` handler

Is your feature request related to a problem? Please describe.
We have defined /ping handler in siren proto and have custom grpc handler to return the response. However if user uses grpc healthcheck protocol, it won't work and there is no 1:1 mapping between http health check and grpc health check.

Since v2.8.0, grpc-gateway supports integrating /ping handler to grpc health check rpc. So http healthcheck handler (/ping) will be served by grpc health check.

Describe the solution you'd like

  • Implement health server
  • Wire health check from health server to /ping handler

Note: Compass does this in this PR.

Improve telemetry

Is your feature request related to a problem? Please describe.
As a new notification service created in siren, we need more visibility on what is going on within siren. We need to improve telemetry in siren from the integration to the metrics.

Describe the solution you'd like

  • Integrate Open Telemetry with Siren
    • This could be started by adding Opencensus library like other odpf projects use (e.g. dex, entropy)
    • Add newrelic exporter so integration with newrelic could be enriched with opencensus
  • Metrics to measure
    • # of errors transformation from notification to notification message
    • # of outbond messages with response status code per receiver type
    • retry metrics of each retrier in receiver client

Use `salt/mux` for server multiplexing

Is your feature request related to a problem? Please describe.
There is a new package to multiplex grpc and http in salt which is salt/mux and salt/server is going to be deprecated. Siren needs to use salt/mux to bootstrap grpc and http server.

Describe the solution you'd like
Use salt/mux to run grpc and http server.

Decopule Business logic from DB logic

In the codebase, business logic is tightly coupled with business logic, making it hard to easily switch between different multiple DB's or make changes that do not affect each other. The code should be refactored to follow a decoupled architecture where a change in business logic doesn't require touching DB logic and vice versa. Reference architecture

Decouple store migration and core domain logic

Is your feature request related to a problem? Please describe.
All domains have a Migrate() function that is being called by migrate command cli only. In this case, mixing migration logic in domain is not needed therefore it is not necessary to add Migrate to the service signature.
On the other hand, there is no business logic when running migration with migrate command. The migration should be store-specific and directly access repository layer.

Describe the solution you'd like

  • Remove MigrateAll() function.
  • Remove Migrate() function in all services.
  • Update migrate command to run repository migration.

Refactor receiver to generalize the implementation

Is your feature request related to a problem? Please describe.
Current implementation of receiver is a bit coupled towards slack receiver. For example: Here is the function to notify receiver, the implementation is not generalize enough and we need to maintain the contract of receiver in proto.

Describe the solution you'd like

  • We can make the receiver to be more generic by defining our generic contract in our proto and then mapping the contract to specific receiver contract.
  • We can avoid having a lot of switch-case by defining a specific implementation for each receiver type. We can use strategy pattern or factory pattern.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.