GithubHelp home page GithubHelp logo

ngine-io / chaotic Goto Github PK

View Code? Open in Web Editor NEW
64.0 2.0 5.0 150 KB

Chaos for Clouds

License: MIT License

Python 97.96% Dockerfile 0.66% Makefile 1.38%
hashicorp-nomad cloudscale-ch chaos-monkey chaos-engineering fault-injection digitalocean vultr exoscale cloudstack hetzner-cloud

chaotic's Introduction

license python versions status pypi version PyPI - Downloads

Chaotic - Chaos for Clouds

Chaotic evaluates a plan, how it will bring chaos in your Cloud environment.

Depending on the Cloud API used, it may kill allocations (Hashicorp Nomad), reboot or stop/start virtual machines in your Cloud environment.

With no arguments given, Chaotic runs as a "one shot" meant to be executed as cron job. Passing --periodic runs it as daemon with configurable interval --interval 5 in minutes (1 is the default). NOTE: The config is re-read on every interval, no need to restart the service after changing the config.

Clouds

Currently implemented Clouds:

  • DigitalOcean
  • Vultr
  • Hetzner Cloud
  • Proxmox KVM
  • CloudStack
  • Hashicorp Nomad
  • Exoscale
  • cloudscale.ch

Install

pip3 install -U chaotic-ngine

Configure

Create a file named config.yaml or use the env var CHAOTIC_CONFIG to point to a config file (also see the example directory):

export CHAOTIC_CONFIG=config_nomad.yaml

Exclude times

Define times when the bot should not doing real actions (it will run in dry-run):

---
kind: ...
excludes:
  weekdays:
    - Sun
    - Sat
  times_of_day:
    - 22:00-08:00
    - 11:00-14:00
  days_of_year:
    - Jan01
    - Apr01
    - May01
    - Aug01
    - Dec24

Exoscale

Chaotic will stop a server selected by an optional filter tag and stop/start it with a delay of a configurable time (default 60s).

export EXOSCALE_API_KEY="..."
export EXOSCALE_API_SECRET="..."
---
kind: exoscale
dry_run: false
configs:

  # Optional, filter tag
  tag:
    key: chaos
    value: enabled

  # Optional, 60 seconds is the default
  wait_before_restart: 60

CloudStack

Chaotic will stop a server selected by an optional filter tag and stop/start it with a delay of a configurable time (default 60s).

export CLOUDSTACK_API_KEY="..."
export CLOUDSTACK_API_SECRET="..."
export CLOUDSTACK_API_ENDPOINT="..."
---
kind: cloudstack
dry_run: false
configs:

  # Optional, filter tag
  tag:
    key: chaos
    value: enabled

  # Optional, 60 seconds is the default
  wait_before_restart: 60

Vultr

Chaotic will stop a server selected by an optional filter tag and stop/start it with a delay of a configurable time (default 60s).

export VULTR_API_KEY="..."
---
kind: vultr
dry_run: true
configs:

  # Optional instance tag filter
  tag: "chaos=opt-in"

  # Optional, 60 seconds is the default
  wait_before_restart: 60

Cloudscale.ch

Chaotic will stop a server selected by an optional filter tag and stop/start it with a delay of a configurable time (default 60s).

Config

export CLOUDSCALE_API_TOKEN="..."
---
kind: cloudscale_ch
dry_run: true
configs:

  # Optional server tag filter
  filter_tag: "chaos=opt-in"

  # Optional, 60 seconds is the default
  wait_before_restart: 60

Hetzner Cloud

Chaotic will stop a server selected by an optional filter label and stop/start it with a delay of a configurable time (default 60s).

Config

export HCLOUD_API_TOKEN=...
---
kind: hcloud
dry_run: false
configs:

  # Optional server label filter
  label: "chaos=enabled"

  # Optional, 60 seconds is the default
  wait_before_restart: 60

DigitalOcean Cloud

Chaotic will stop a droplet selected by an optional filter tag and stop/start it with a delay of a configurable time (default 60s).

Config

export DIGITALOCEAN_ACCESS_TOKEN=...
---
kind: digitalocean
dry_run: false
configs:

  # Optional droplet tag filter
  tag: "chaos:enabled"

  # Optional, 60 seconds is the default
  wait_before_restart: 60

Nomad Job

Chaotic will send an allocation signal to an allocation in the available namespaces selected by an allow list.

Config

export NOMAD_ADDR=http://nomad.example.com:4646
---
kind: nomad
dry_run: true
configs:
  experiments:
    - job

  # Signals to choose from
  signals:
    - SIGKILL

  # Optional: namespace allowlist
  namespace_allowlist:
    - example-prod
    - foobar-prod

  # Optional: namespace denylist
  namespace_denylist:
    - default

  # Optional: job type skip list
  job_type_skiplist:
    - system
    - batch
    - sysbatch

  # Optional: job name skip list
  job_skiplist:
    - my-job-name

  # Optional: Add a meta tag in your nomad job "chaotic" = False to opt-out
  job_meta_opt_key: chaotic

Nomad Node

Chaotic will drain a node and set it to be ineligible for some time.

Config

export NOMAD_ADDR=http://nomad.example.com:4646
---
kind: nomad
dry_run: true
configs:
  experiments:
    - node

  # Optional: Node drain deadline in seconds, default 10
  node_drain_deadline_seconds: 15

  # Optional: Skip nodes in these classes
  node_class_skiplist:
    - storage

  # Optional: Skip nodes with these names
  node_skiplist:
    - node1
    - node5

  # Optional: Wait for this amount of seconds before set node to be eligible again, default 60
  node_wait_for: 100

  # Optional: Also drain system jobs, default false
  node_drain_system_jobs: true

  # Optional: Drain multiple nodes in one run in percent, fallback 1 node
  node_drain_amount_in_percent: 30

Proxmox KVM

Chaotic will stop a VM stop/start it with a delay of a configurable time (default 60s).

export PROXMOX_API_HOST="pve1.example.com"
export PROXMOX_API_USER="root@pam"
export PROXMOX_API_PASSWORD="..."
---
kind: proxmox_kvm
dry_run: false
configs:

  # Optional: Do not shutdown VMs having a lower uptime in minutes
  min_uptime: 60

  # Optional: Do not shutdown VMs in this name list
  denylist:
    - my-single-vm

  # Optional: 60 seconds is the default
  wait_before_restart: 60

Run

CLI

chaos-ngine

Docker

One shot:

docker run -ti --rm -v $PWD/examples/config_nomad.yaml:/app/config.yaml -e TZ=Europe/Zurich -e NOMAD_ADDR=$NOMAD_ADDR --name chaotic ghcr.io/ngine-io/chaotic:latest

As service:

docker run -ti --rm -v $PWD/examples/config_nomad.yaml:/app/config.yaml -e TZ=Europe/Zurich -e NOMAD_ADDR=$NOMAD_ADDR --name chaotic ghcr.io/ngine-io/chaotic:latest --periodic

Logs

What you should see (e.g. for kind cloudscale.ch):

2021-06-09 09:01:25,433 - cloudscale.log:INFO:Started, version: 0.6.2
2021-06-09 09:01:25,433 - cloudscale.log:INFO:Using profile default
2021-06-09 09:01:25,433 - cloudscale.log:INFO:API Token used: xyz...
2021-06-09 09:01:25,433 - chatic:INFO:Querying with filter_tag: None
2021-06-09 09:01:25,433 - cloudscale.log:INFO:HTTP GET to https://api.cloudscale.ch/v1/servers
2021-06-09 09:01:25,651 - cloudscale.log:INFO:HTTP status code 200
2021-06-09 09:01:25,652 - chatic:INFO:Choose server app3
2021-06-09 09:01:25,653 - chatic:INFO:Stopping server app3
2021-06-09 09:01:25,653 - cloudscale.log:INFO:HTTP POST to https://api.cloudscale.ch/v1/servers/d5628484-a6eb-4ea9-b3ef-ba8da2bb9fe0/stop
2021-06-09 09:01:26,336 - cloudscale.log:INFO:HTTP status code 204
2021-06-09 09:01:26,336 - chatic:INFO:Sleeping for server 60
2021-06-09 09:02:26,393 - cloudscale.log:INFO:HTTP POST to https://api.cloudscale.ch/v1/servers/d5628484-a6eb-4ea9-b3ef-ba8da2bb9fe0/start
2021-06-09 09:02:26,955 - cloudscale.log:INFO:HTTP status code 204
2021-06-09 09:02:26,956 - chatic:INFO:done

chaotic's People

Contributors

dependabot[bot] avatar resmo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

chaotic's Issues

Allow listing for system nomad jobs

right now i can see chaos is performing for all jobs but seems like chaos should happen for service nomad jobs. System jobs keep running or depend on condition how people are performing chaos. But I think we need feature where we can skip chaos testing for system jobs by adding allowlist for nomad job types.

feature: config dir

what about a config dir with various configs and the bot randomly choose one?

Stress On workload machine in nomad

It would be nice if we generate a load on the client. This is a starting point but I think if we can generate load and nomad clients get high utilized then it should allocate allocation on another workload

feature: exclude times

Implement exclude times

---
kind: nomad
dry_run: true
timezone: Europe/Berlin
excludes:
  weekdays:
    - Sun
    - Sat
  times_of_day:
    - 22:00-08:00
    - 11:00-14:00
  days_of_year:
    - Jan1-3
    - Apr1
    - May1
    - Aug1
    - Dec24-31
configs:
  namespace_allowlist:
    - example-prod
  signals:
    - SIGKILL

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.