GithubHelp home page GithubHelp logo

periskop-dev / periskop Goto Github PK

View Code? Open in Web Editor NEW
181.0 62.0 15.0 3.16 MB

Exception Monitoring Service

License: Apache License 2.0

Makefile 1.02% Go 51.38% JavaScript 0.26% TypeScript 43.02% HTML 1.63% Dockerfile 0.98% SCSS 1.70%
error-reporting monitoring errors error-monitoring exceptions

periskop's Introduction

Build Status Gitter chat

Pull based, language agnostic exception aggregator for microservice environments.

Periskop scales well with the number of exceptions and application instances:

  • Exceptions are pre-aggregated in client libraries and stored efficiently in memory, while keeping a sample of concrete occurrences for inspection.
  • Exceptions are scraped and aggregated across instances by the server component.
  • More application instances result in longer refresh cycles but the memory usage remains constant.

A UI component is provided for convenience.

Scraping

Errors are scraped and aggregated using a configured endpoint from each of the instances discovered via service discovery.

Periskop supports all service discovery mechanisms supported by Prometheus. The configuration format for service discovery mirrors the one from Prometheus. See Prometheus's official documentation for reference.

A full example of service configuration for Periskop can be found in the sample configuration.

Format

The format for scraped errors is defined in a proto3 IDL. Currently the only supported protocol is snake_cased JSON over HTTP (example).

UI

The UI allows navigating and inspecting exceptions as they occur.

ui

Run project locally

Please see CONTRIBUTING.md

Building & Running

We are looking into distributing Periskop via Docker Hub. In the meantime, you can build and run Periskop from source:

docker build --tag periskop .
docker run -v path/to/config.yaml:/etc/periskop/periskop.yaml -p 8080:8080 periskop

Enable persistance storage

By default Periskop stores all the scrapped errors in memory repository. You can configure your Periskop deployment to use persistent storage. Currently the supported persistance storages are SQLite, MySQL and PostgreSQL.

For SQLite, add these lines to your config.yaml file:

repository:
  type: sqlite
  path: periskop.db

For MySQL:

repository:
  type: mysql
  dsn: user:pass@tcp(127.0.0.1:3306)/dbname?charset=utf8mb4&parseTime=True&loc=Local

For PostgreSQL:

repository:
  type: postgres
  dsn: host=localhost user=gorm password=gorm dbname=gorm port=9920 sslmode=disable

Alert reported exceptions

All reported errors are instrumented with Prometheus which provides alerting capabilities using Alertmanager. You can configure an alert when you reach some threshold of errors. Here's an example:

groups:
- name: periskop
  rules:
  - alert: TooManyErrors
    expr: periskop_error_occurrences{severity="error"} > 1000
    for: 5m
    labels:
      severity: critical    
    annotations:
      summary: "Too many errors on {{ $labels.service_name }}"
      description: "Errors for {{ $labels.service_name }}({{ $labels.aggregation_key }}) is {{ $value }}"
      dashboard: "https://periskop.example.com/#/{{ $labels.service_name }}/errors/{{ $labels.aggregation_key }}"

Pushgateway

See periskop-pushgateway if you want to use Periskop as push based metric system.

Client Libraries

Integrations

backstage-plugin

periskop's People

Contributors

alonpeer avatar annetteres avatar blewis11 avatar cangoektas avatar dependabot-preview[bot] avatar dependabot[bot] avatar dziemba avatar edisonabdiel avatar esttorhe avatar fkorotkov avatar genaforvena avatar jcreixell avatar leopic-sc avatar matanse avatar sikozonpc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

periskop's Issues

Expose scraped targets

It is useful to have a list of targets being scraped per service in order to identify potential issues and verify that integrations are working as expected.

This could be first implemented via API endpoint and added to the UI as a second step.

Add first error seen field

It would very useful to add a field to see when an error happened for the first time, specially useful when you have an outage.

Sort exceptions in the Sidebar

At the moment, exceptions in the sidebar are sorted by time (latest exceptions first). Sometimes it is useful to be able to sort by other criteria.

Implement a dropdown on the top of the sidebar that allows selecting the sorting type (by latest occurrence, by count). Implement the sorting logic that obeys this option.

Show request body

#8 is about showing the request body of a failure, if available. It needs to be optional since the information might be sensitive in some cases, but it can be useful for example for requests that send the params in the body instead of in the query string (like tracks service). So in order to implement this, we need to extend the API to accept a new body field that might contain the request content string, and then pass it to the front-end for rendering

Docker-ise development environment

I usually don't like polluting my machine with different developer environments for different languages and projects I'm working on.

It would be good to have the development environment containerised using either a single Dockerfile for development or with the usage of docker-compose to be able to boot up the front-end, back-end and mock-target as independent machines.

Maybe the differentiation of back-end & front-end is not necessary but I would like to give it a try

Make UI responsive

Periskop should be usable on mobile phones and other small form factor displays. The current experience is not too bad but could be improved by collapsing the sidebar and improving navigation.

Allow filtering by severity

It is currently not possible to only show errors in the sidebar with a specific severity (Error, Warning, Info). They are shown with a different color tag. It would be useful to add a severity filter selector next to the sorting selector.

Add Prometheus integration

It would be nice to have metrics for:

Number of instances scraped
Number of errors scraped
Number of errors

The should be exposed under a different port, endpoint /metrics using standard prometheus format.

Allow logging events without exceptions

It would be useful to allow info level events to be scraped and showed by Periskop, even without an exception. An aggregation key would still need to be provided.

This is very handy in cases where a log line is too big and gets split into smaller log lines, making it hard to use logging for things like comparisons between branched logic (comparing representations, response bodies, etc). In most of these cases, we are mostly looking for patterns rather than complete information, and Periskop can be useful for debugging and troubleshooting.

Allow defining regular expressions for advanced error grouping

In some cases (for example finagle timeouts in the Scala client, when no stack trace is available) errors that have different messages are grouped together into a single exception type. This happens because the message is not used in the hashing function.

In other cases like, in the go library, in absence of exception types, the message is used for hashing, leading to too fine grained separation of exceptions.

It would be useful to allow defining regular expressions in the client for advanced grouping rules. These regular expressions could be propagated to the server in the exceptions endpoint to be reused for cross-instance aggregation, and potentially also for federation,

Add ability to mark errors as resolved

In the current setup the only way to clean errors that are already fixed is redeploying/restarting the service instrumented with Periskop. Introducing this feature will lead to have reported errors that could only be deleted if you restart Periskop.

Ideally each error should have a button to mark the exception as resolved, which should trigger a request to the API to remove the error from the server.

Fix panic caused by negative counts

In certain scenarios (reuse of target identifiers / ips) counts can become negative and trigger a panic.

An example trace:

2021-06-23T08:03:48.809047158	stderr	panic: counter cannot decrease in value
2021-06-23T08:03:48.809073616	stderr
2021-06-23T08:03:48.809078532	stderr	goroutine 119 [running]:
2021-06-23T08:03:48.809082535	stderr	github.com/prometheus/client_golang/prometheus.(*counter).Add(0xc013670a20, 0xc060600000000000)
2021-06-23T08:03:48.809089026	stderr		/go/pkg/mod/github.com/prometheus/client_golang@v1.5.1/prometheus/counter.go:109 +0x125
2021-06-23T08:03:48.809108513	stderr	github.com/soundcloud/periskop/scraper.errorAggregateMap.combine(0xc000c016a0, 0xc000044390, 0x14, 0xc000531ca0, 0xc014956420, 0x1, 0x4, 0xc01a96b560, 0x52, 0xc000c016d0, ...)
2021-06-23T08:03:48.809115075	stderr		{redacted}/periskop/scraper/scraper.go:68 +0x598
2021-06-23T08:03:48.809149318	stderr	github.com/soundcloud/periskop/scraper.Scraper.Scrape(0xc00052fe90, 0x0, 0x0, 0x0, 0xc000531ca0, 0xc000044390, 0x14, 0x0, 0x0, 0x0, ...)

See https://github.com/soundcloud/periskop/pull/145/files#r498918049

Use urlencode for permalink of an error

The permalink used for to build the unique url of an error has a format like *errors.errorString@116193d1 which has many special chars. We should use urlencode in order to avoid any possible error.

Periskop Server: Merge exceptions instead of swapping

Currently we swap one snapshot of services for another, resulting in an exact snapshot of errors from the last scrape.

This approach is simple but has the disadvantage of losing exception information after a deployment revert and posterior debugging, which is the first thing that happens on a bad deploy.

A more useful alternative would be to do a shallow merge of exceptions, so that older ones remain kept in memory. This does not solve the problem of losing exception information between Periskop deploys (which could be solved by introducing persistent storage), but it is an improvement over the current situation.

Make Periskop service a PWA

Progressive Web Apps provide multiple advantages including the ability to be installed and offer a good offline experience.

Add support for Kubernetes relabel

Although the current service discovery module for Kubernetes works, it is not very useful without the ability to filter using labels. In addition, we could annotate exceptions with labels and display them in the UI, enabling more advanced filtering mechanisms.

We should be able to reuse Prometheus's relabel module for that. Labels could be stored together with target instances and process them at scrape time.

See https://github.com/grafana/loki/blob/f9ff2a4b02832c7495f890abd2d73f4c8d44f1af/pkg/promtail/targets/journaltarget.go#L264 for an example on how this could work.

Relabel configuration format should be compatible with prometheus.

Distribution

  • Get access to Docker Hub
  • Define packaging and versioning strategy
  • Change CI pipeline to package and upload to Docker Hub
  • Write documentation how to run the Docker image

Search by aggregated_key in the Sidebar

At the moment, Periskop shows all exceptions aggregated by key in the sidebar. However, it is hard to find a specific exception (especially when the exception name is too long and gets trimmed).

In order to make it easier to find exceptions, implement a search box on the top of the sidebar allowing to filter exceptions by key.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.