GithubHelp home page GithubHelp logo

mesh-perf's Introduction

Mesh Performance Tests

Performance tests of Kong Mesh.

Run

  1. Install dependencies
make dev/tools
  1. Create local cluster
ENV=local make start-cluster
  1. Run tests from mesh-perf directory
make run
  1. Destroy local cluster
ENV=local make destroy-cluster

Setup EKS cluster from your machine

It is recommended to use saml2aws for AWS authorization. After authorizing you just need to run command

AWS_PROFILE=saml ENV=eks make start-cluster

Observability

Observability tool is a way to inspect the end result of perf tests. Perf test ends with snapshot of Prometheus TSDB save on the host which run the perf test (defaults to /tmp/prom-snapshots). This directory will look like this

❯❯❯ ll -la /tmp/prom-snapshots/
total 0
drwxr-xr-x   6 jakub  wheel   192B Jun 29 15:40 ./
drwxrwxrwt  15 root   wheel   480B Jun 29 14:30 ../
drwxr-xr-x   6 jakub  wheel   192B Jun 29 15:28 20230629T125736Z-5c8c90f181c0b57f/
drwxr-xr-x   3 jakub  wheel    96B Jun 29 15:30 20230629T133034Z-77fee4f8e5a90c89/
drwxr-xr-x   3 jakub  wheel    96B Jun 29 15:33 20230629T133316Z-5e37819462543e4f/
drwxr-xr-x   3 jakub  wheel    96B Jun 29 15:40 20230629T134058Z-035f3439076d9f04/

You can run Docker Compose of Prometheus + Grafana with the data from test.

PROM_SNAPSHOT_PATH=/tmp/prom-snapshots/20230629T134058Z-035f3439076d9f04 make start-grafana

Grafana will be forwarded to localhost:3000. Kuma CP dashboard should be ready.

To update kuma-cp.json dashboard:

  • place mesh-perf project next to kuma
  • run make upgrade/dashboards from the top level directory of mesh-perf.

mesh-perf's People

Contributors

dependabot[bot] avatar lobkovilya avatar automaat avatar michaelbeaumont avatar jakubdyszkiewicz avatar lahabana avatar lukidzi avatar github-actions[bot] avatar slonka avatar

Watchers

Guanlan D. avatar Marco Palladino avatar Jay Jijie Chen avatar  avatar  avatar Bart Smykla avatar  avatar  avatar

Forkers

lahabana

mesh-perf's Issues

Add resources to observability components

Today when deploy Grafana and Prometheus using kumactl install observability the resources: {} section is empty for all containers.

When resources are not specified the container receives what's left in terms of resources. Even if it's fine for Grafana, Prometheus's throttling can affect test execution (tests rely on Prometheus metrics) and the resulting snapshot.

Calculate most cost optimal EKS cluster for running bigger tests

Each EC2 instance type have limited number of pods that can be deployed (list here). We also need enough cpu to accommodate test services.

  • Find ec2 instances with the biggest possible pod number and lowest cost.
  • Find formula that will help us calculate cluster size in relation to expected number of pods in perf test

Change report format we send to Datadog

Today we send log that looks like:

hostname="github-actions", service="mesh-perf-test", specReports=[{report1},{report2},{report3}...]

Apparently it's not possible to generate metrics in DD based on items inside specReports array. We can access items only by index.

We have to change the format to:

hostname=github-actions, service="mesh-perf-test", specReport={report1}
hostname=github-actions, service="mesh-perf-test", specReport={report2}
hostname=github-actions, service="mesh-perf-test", specReport={report3}

In that case extracting attributes in DD is pretty straightforward.

report should be comparable between test runs

So the reports should be an artifact in a good format that contains parameters (number of services, number of pods, version...) and a set of aggregated metrics (ok to start with just duration).

We should then be able to retrieve all the runs for a period of time and then plot them to compare.

Reduce the size of Prometheus snapshot

When running 2k pods the snapshot can be around 400Mb. It contains a lot of kube metrics we're not using for our dashboard, so it makes sense to somehow exclude them from the snapshot.

Test suite fails in "AfterAll" due to namespace termination take too much time

• [FAILED] [241.758 seconds]
Simple [AfterAll] should distribute certs when mTLS is enabled
  [AfterAll] /home/runner/go/pkg/mod/github.com/kumahq/[email protected]/test/framework/ginkgo.go:33
  [It] /home/runner/work/mesh-perf/mesh-perf/test/k8s/simple_test.go:240

  [FAILED] 'Wait for kuma-test Namespace to terminate.' unsuccessful after 60 retries
  
  In [AfterAll] at: /home/runner/go/pkg/mod/github.com/kumahq/[email protected]/test/framework/k8s_cluster.go:1004

Should we even wait for namespace termination if we destroy the cluster right after?

first test suite

(cleanup application and cp between tests), we can reuse service from kuma-tools - generate-mesh.go

Perf Test stages

  1. Run perf test locally
  2. Run periodically on cloud env
  3. Tests are running with Prometheus, and we are able to extract metrics after tests are completed
  4. Test results with metrics are persisted

LeaderElection `renewDeadline` can be too small

By default, renewDeadline is 10s, but when Kube API is busy it can reply much longer (up to 60s). We should probably configure renewDeadline to be 80s.

Keep in mind that leaseDuration apparently can't be shorted than renewDeadline, so we should set it to 100s or something like that.

These parameters should be set here https://github.com/kumahq/kuma/blob/master/pkg/plugins/bootstrap/k8s/plugin.go#L58. So this feature requires first to make them configurable in Kuma.

Add inputs to action

We need:

  1. number of services
  2. number of pods per service

These should have good defaults

Reduce `scrape_interval` for Perf Tests

Current default value is 10s and it's hardcoded in kumactl install observability.

We should either add a flag --scrape-interval to kumactl or override this value somehow in prometheus-server ConfigMap during the test setup.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.