GithubHelp home page GithubHelp logo

tarantool / grafana-dashboard Goto Github PK

View Code? Open in Web Editor NEW
32.0 31.0 11.0 5.92 MB

Dashboard for Tarantool application and database server monitoring with Grafana

License: MIT License

Lua 11.10% Dockerfile 0.49% Shell 1.31% Jsonnet 85.81% Makefile 1.29%
grafana grafonnet jsonnet tarantool grafana-dashboard prometheus

grafana-dashboard's Introduction

Tarantool

Actions Status Code Coverage OSS Fuzz Telegram GitHub Discussions Stack Overflow

Tarantool is an in-memory computing platform consisting of a database and an application server.

It is distributed under BSD 2-Clause terms.

Key features of the application server:

Key features of the database:

  • MessagePack data format and MessagePack based client-server protocol.
  • Two data engines: 100% in-memory with complete WAL-based persistence and an own implementation of LSM-tree, to use with large data sets.
  • Multiple index types: HASH, TREE, RTREE, BITSET.
  • Document oriented JSON path indexes.
  • Asynchronous master-master replication.
  • Synchronous quorum-based replication.
  • RAFT-based automatic leader election for the single-leader configuration.
  • Authentication and access control.
  • ANSI SQL, including views, joins, referential and check constraints.
  • Connectors for many programming languages.
  • The database is a C extension of the application server and can be turned off.

Supported platforms are Linux (x86_64, aarch64), Mac OS X (x86_64, M1), FreeBSD (x86_64).

Tarantool is ideal for data-enriched components of scalable Web architecture: queue servers, caches, stateful Web applications.

To download and install Tarantool as a binary package for your OS or using Docker, please see the download instructions.

To build Tarantool from source, see detailed instructions in the Tarantool documentation.

To find modules, connectors and tools for Tarantool, check out our Awesome Tarantool list.

Please report bugs to our issue tracker. We also warmly welcome your feedback on the discussions page and questions on Stack Overflow.

We accept contributions via pull requests. Check out our contributing guide.

Thank you for your interest in Tarantool!

grafana-dashboard's People

Contributors

artembo avatar differentialorange avatar nickvolynkin avatar oleg-jukovec avatar opomuc avatar vasiliy-t avatar yngvar-antonsson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grafana-dashboard's Issues

Build dashboard with pre-defined data source

It's impossible to build json with fixed datasource since all dashboards choose target type based on template variable value

if datasource == '${DS_PROMETHEUS}' then
prometheus.target(
expr=std.format('rate(%s{job=~"%s"}[%s])',
[metric_name, job, rate_time_range]),
legendFormat='{{alias}}',
)
else if datasource == '${DS_INFLUXDB}' then
influxdb.target(
policy=policy,
measurement=measurement,
group_tags=['label_pairs_alias'],
alias='$tag_label_pairs_alias',
).where('metric_name', '=', metric_name)
.selectField('value').addConverter('mean').addConverter('non_negative_derivative', ['1s']),

If someone decides to build compiled dashboard with myinflux datasource, it won't work out.

"example" project is not helpful for beginners

There is an example (https://github.com/tarantool/grafana-dashboard/tree/master/example) in our repo. It's not trivial to understand how does it works for person who first time run Grafana and Prometheus.

First of all, "project" application doesn't expose any metrics.

User need manually add to init.lua:

local metrics = require('cartridge.roles.metrics')
metrics.set_export({
    {
        path = '/metrics/prometheus',
        format = 'prometheus',
    }
})

Secondly it's not obvious how to install and run Prometheus and Graphana.
I think it's a lack of simple readme.md file that says:

brew install prometheus
brew install graphana
brew services start grafana # grafana will be available on :3000
prometheus --config.file="prometheus/prometheus.yml" # prometheus will be available on :9090
# Here we need to verify "Status" -> "Targets" page.

Also it's better to change target urls from "example_project:" to "localhost:" - because basically user will run application using cartridge start that starts applications on the localhost.

Please add replicasets.yml file to test project. It will help to avoid some excess actions from user - all we need to call cartridge replicasets setup.

Finally, it would be great to add a small description about steps how to import dashboard from grafana.com. It's easier if user understand that "job" option means. It wasn't so in my case and I spent some time to understand what's wrong.

Publish alerts on Prometheus.io

There is a rumor that Prometheus.io have a section with user configuration which we can use to publish our alert rules. We should check it up and publish them somewhere if its possible.

Health check

There is no health check both in dashboard and default Cartridge/metrics tools. Health check is required for overview panels.

Proposition: "if instance is sending any metrics, it is alive; otherwise, instance is not running (or something else is unhealthy)".

Alert rules documentation

Since #59 we have a set of Prometheus alert rules. But it is just an example yml file with some comments. It would be more convenient for customers which interested in "how to set up alerts for tarantool cluster" to have some documentation page describing what and how you should monitor. Of course, we may base it on #59 results.

Add network memory panels

HTTP is not the only way of interacting with Tarantool instances. Binary protocol connections (iproto) are also popular. You can monitor them with group of tnt_net_ metrics. It should be decided which tnt_net_ are most helpful and then they should be added as panels to dashboard.

Add some illustrations and descriptions

The dashboard should be described somewhere. It is needed to compile the board now to get what it consists of or read through the code. The description should be illustrated with screenshots.

[2pt] Create GitHub Actions workflows to translate documentation

The documentation translation process should be automated. We need to implement a set of tasks at each stage of tech writer's workflow to maintain 100% translation for each repository containing documentation.

  • Tech writer creates PR with a documentation update (rst sources).
    • (push-translation on pull_request) GitHub workflow builds po files from rst and sends them to the same name branch in Crowdin
  • Pull translations back
    • (pull-translation on workflow_dispatch) When translation of the corresponding changes is ready, tech writer manually triggers GitHub workflow for the translated branch (PR branch)
  • Tech writer or someone responsible merges PR
    • (upload-translation on push to master) GithHub workflow upload merged translations from po files into Crowdin master (or main) branch

Set alias on example cluster

Since moving on metrics role (#10) alias should be set through env, but now it's missing.

Default dashboard don't work as it should be

Add customization guide

It is hard for beginners to understand how to build their own dashboard. The source code is a bit complicated, and there are no convenient way to add custom panels to existing dashboard -- they need to copypaste an entire dashboard. This way is also inconvenient if one wants to add its own panels to the tail of a dashboard and get all source dashboard updates automatically.

We need to add some guide on custom panel build (maybe in form of an article) and support adding custom panels to already existing dashboard in code.

Add configurable InfluxDB policy

default InfluxDB policy is used on all panels now. This should be reworked to configurable (preferably on import like measurement) policy.

Example cluster don't start

example_project_1  | .///cluster/integration/bootstrap_test.lua:110: attempt to index field 'cluster' (a nil value)
example_project_1  | stack traceback:
example_project_1  | 	/app/.rocks/share/tarantool/luatest/capture.lua:139: in function '__index'
example_project_1  | 	.///cluster/integration/bootstrap_test.lua:110: in main chunk
example_project_1  | 	[C]: in function 'require'
example_project_1  | 	/app/.rocks/share/tarantool/luatest/loader.lua:43: in function 'load_tests'
example_project_1  | 	/app/.rocks/share/tarantool/luatest/runner.lua:25: in function </app/.rocks/share/tarantool/luatest/runner.lua:14>
example_project_1  | 	[C]: in function 'xpcall'
example_project_1  | 	/app/.rocks/share/tarantool/luatest/utils.lua:32: in function 'load_tests'
example_project_1  | 	/app/.rocks/share/tarantool/luatest/runner.lua:56: in function </app/.rocks/share/tarantool/luatest/runner.lua:40>
example_project_1  | 	[C]: in function 'xpcall'
example_project_1  | 	/app/.rocks/share/tarantool/luatest/runner.lua:40: in function 'fn'
example_project_1  | 	/app/.rocks/share/tarantool/luatest/sandboxed_runner.lua:14: in function 'run'
example_project_1  | 	/app/.rocks/share/tarantool/luatest/cli_entrypoint.lua:4: in function </app/.rocks/share/tarantool/luatest/cli_entrypoint.lua:3>
example_project_1  | 	....rocks/share/tarantool/rocks/luatest/0.5.0-1/bin/luatest:3: in main chunk
grafana-dashboard_example_project_1 exited with code 255

CPU time panels format

CPU time panels s(seconds) format is a bit confusing. It may be more convenient for users to inspect a panel with percentage format.

Publish to Grafana with CI/CD

It would be convenient to publish new version of dashboard to Grafana Official & community built dashboards page with some sort of CI/CD, if it is possible.

Autoscreenshooting tool

It is an annoying and monotonous job to make screenshots of new dashboard each time. Maybe there are some tools to make this with some script.

Add CHANGELOG.md

Add CHANGELOG.md and describe all previously added features in it

Add cluster issues panel

Cluster issues metric is an important metric in terms of cluster overview. For example, original "issues" button is placed in top of Cartridge UI. I think it should be added as part of cluster overview panels.

Separate load from example app

Example app is a luatest code that both creates the cluster and generates the load so graphs will be non-empty. It would be better if we had an example cluster and load generator separately. For example, cluster could be bootstrapped with cartridge-cli(#34).

Fix Prometheus rps computation

If Prometheus scrape_time is equal to 1m or less, rate() computation of data vector for 1 minute will fail

rate(tnt_stats_op_total{job=~\"[[job]]\",operation=\"upsert\"}[1m])

It is equal to 1m by default, so we should increase default vector time interval at least to 2m ("at the very minimum it should be two times the scrape interval" -- https://www.metricfire.com/blog/understanding-the-prometheus-rate-function/?GAID=231802381.1611061565&GAID=231802381.1611061565) and make in configurable.

Overall load panels are red on low load

It the load is low, overall load panels show values in red. I think it's not a good idea to scare user like that since this may be expected (for example, HTTP handle only used to control the state of an app)

image
image

Provide Graphite example

metrics module supports Graphite. We should provide an example with Graphite monitoring stack, similar to Prometheus and InfluxDB ones.

Add Lua memory panel

Lua memory is one of the most important metric in Tarantool default metrics. At least, it is the one that should be wrapped in alert because of strict 2 Gb limit per instance.

Bootstrap example cluster with cartridge-cli

Luatest is not a tool created for bootstrapping configured (with both roles and some clusterwide config) cluster, but it can do it while cartridge-cli can't. We should consider moving to cartridge-cli bootstrap in example cluster when cli will be able to do this.

Monitor replication

Replication status and replication lag is an important information about cluster state. There are some replication metrics in Tarantool default metrics, but it may be not sufficient. We should study replication to find out what metrics are needed to monitor replication and then (maybe after reworking current replication metrics) add them as panels to dashboard.

Release on Grafana Dashboards with collaborative account

There is no possibility to share access on dashboard edit and publish in Grafana Official and community built dashboards now. We need to create come kind of collaborative account to make it able to edit and upload new versions of dashboard with our team.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.