elastic / e2e-testing Goto Github PK

View Code? Open in Web Editor NEW

24.0 248.0 42.0 11.06 MB

Formal verification of Elastic-Agent and more using BDD

License: Other

Gherkin 4.68% Go 83.32% Groovy 4.56% Shell 2.92% Makefile 3.08% Dockerfile 1.05% PowerShell 0.15% Python 0.24%

cucumber godog bdd golang poc

e2e-testing's Introduction

End-2-End tests for the Observability projects

This repository contains:

A CI Infrastructure to provision VMs where the tests will be executed at CI time.
A Go library to provision services in the way of Docker containers. It will provide the services using Docker Compose files.
A test framework to execute e2e tests for certain Observability projects:
- Kubernetes Autodiscover
- Fleet
  - Stand-Alone mode
  - Fleet mode
  - and more!
A collection of utilities and helpers used in tests.

If you want to start creating a new test suite, please read the quickstart guide, but don't forget to come back here to better understand the framework.

If you want to start running the tests, please read the "running the tests" guide.

The E2E test project uses BDD (Behavioral Driven Development), which means the tests are defined as test scenarios (or simply scenarios). A scenario is written in plain English, using business language hiding any implementation details. Therefore, the words "clicks", "input", "browse", "API call" are NOT allowed. And we do care about having well-expressed language in the feature files. Why? Because we want to hide the implementation details in the tests, and whoever is reading the feature files is able to understand the expected behavior of each scenario. And for us that's the key when talking about real E2E tests: to exercise user journeys (scenarios) instead of specific parts of the UI (graphical or API).

Behaviour-Driven Development and this test framework

We need a manner to describe the functionality to be implemented in a functional manner. And it would be great if we are able to use plain English to specify how our software behaves, instead of using code. And if it's possible to automate the execution of that specification, even better. These behaviours of our software, to be consistent, they must be implemented following certain BDD (Behaviour-Driven Development) principles, where:

BDD aims to narrow the communication gaps between team members, foster better understanding of the customer and promote continuous communication with real world examples.

The most accepted manner to achieve this executable specification in the software industry, using a high level approach that everybody in the team could understand and backed by a testing framework to automate it, is Cucumber. So we will use Cucumber to set the behaviours (use cases) for our software. From its website:

Cucumber is a tool that supports Behaviour-Driven Development(BDD), and it reads executable specifications written in plain text and validates that the software does what those specifications say. The specifications consists of multiple examples, or scenarios.

The way we are going to specify our software behaviours is using Gherkin:

Gherkin uses a set of special keywords to give structure and meaning to executable specifications. Each keyword is translated to many spoken languages. Most lines in a Gherkin document start with one of the keywords.

The key part here is executable specifications: we will be able to automate the verification of the specifications and potentially get a coverage of these specs.

Then we need a manner to connect that plain English feature specification with code. Fortunately, Cucumber has a wide number of implementations (Java, Ruby, NodeJS, Go...), so we can choose one of them to implement our tests. For this test framework, we have chosen Godog, the Go implementation for Cucumber. From its website:

Package godog is the official Cucumber BDD framework for Go, it merges specification and test documentation into one cohesive whole.

In this test framework, we are running Godog with go test, as explained here.

Statements about the test framework

It uses the Given/When/Then pattern to define scenarios.
- Given a state is present (optional)
- When an action happens (mandatory)
- Then an outcome is expected (mandatory)
- And/But clauses are allowed to provide continuation on the above ones. (Optional)
Because it uses BDD, it's possible to combine existing scenario steps (the given-when-then clauses) forming new scenarios, so with no code it's possible to have new tests. For that reason, the steps must be atomic and decoupled (as any piece of software: low coupling and high cohesion).
It does not use the GUI at all, so there is no Selenium, Cypress or any other test library having expectations on graphical components. It uses Go as the programming language and Cucumber to create the glue code between the code and the plain English scenarios. Over time, we have demonstrated that the APIs are not changing as fast as the GUI components.
APIs not changing does not mean zero flakiness. Because there are so many moving pieces (stack versions), beats versions, elastic-agent versions, cloud machines, network access, etc... There could be situations where the tests fail, but they are rarely caused by test flakiness. Instead, they are usually caused by: 1) instability of the all-together system, 2) a real bug, 3) gherkin steps that are not consistent.
Kibana is basically at the core of the tests, because we hit certain endpoints and wait for the responses.
- Yes, the e2e tests are first citizen consumers of Kibana APIs, so they could be broken at the moment an API change on Kibana. We have explored the idea of implementing Contract-Testing with pact.io (not implemented but in the wish list).
- A PoC was submitted to kibana and to this repo demonstrating the benefits of Contract-Testing.
The project usually checks for JSON responses, OS processes state, Elasticsearch queries responses (using the ES client), to name a few, so the majority of the assertions relies on checking those entities and its internal state: process state, JSON response, HTTP codes, document count in an Elasticsearch query, etc.

Building

This project utilizes goreleaser to build the cli binaries for all supported platforms. Please see goreleaser installation for instructions on making that available to you.

Once goreleaser is installed building the cli is as follows:

$ make build

This will put the built distribution inside of dist in the current working directory.

Contributing

pre-commit

This project uses pre-commit so, after installing it, please install the already configured pre-commit hooks we support, to enable pre-commit in your local git repository:

$ pre-commit install
pre-commit installed at .git/hooks/pre-commit

To understand more about the hooks we use, please take a look at pre-commit's configuration file.

Backports

This project requires backports to the existing active branches. Those branches are defined in the .backportrc.json and .mergify.yml files. In order to do so, there are two different approaches:

Mergify 🥇

This is the preferred approach. Backports are created automatically as long as the rules defined in .mergify.yml are fulfilled. From the user's point of view it's required only to attach a labels to the pull request that should be backported, and once it gets merged the automation happens under the hood.

Backportrc 👴

This is the traditional approach where the backports are created by the author who created the original pull request. For such, it's required to install backport and run the command in your terminal

$ backport  --label <YOUR_LABELS> --auto-assign --pr <YOUR_PR>

Generating documentation about the specifications

If you want to transform your feature files into a nicer representation using HTML, please run this command from the root e2e directory to build a website for all test suites:

$ make build-docs

It will generate the website under the ./docs directory (which is ignored in Git). You'll be able to navigate through any feature file and test scenario in a website.

Noticing the test framework

To generate the notice files for this project:

Execute make notice to generate NOTICE.txt file.

Contributors and maintenance

We have received contributions from multiple teams in different aspects of the e2e tests project, so we are ecstatic to receive them:

Adam Stokes and Manuel de la Peña, from the Observability Robots team have created the tests framework.
Julia Bardi and Nicolas Chaulet, from Fleet team, frontend engineers, have contributed a few scenarios for Fleet.
Eric Davis, QA engineer, has helped in the definition of the scenarios for Fleet.
Igor Guz, QA engineer in the Security team, has contributed scenarios for the security-related integrations, such as Endpoint, Linux and System.
Christos Markou and Jaime Soriano have contributed the k8s-autodiscover test suite, which is maintained by @elastic/obs-cloudnative-monitoring.
Julien Lind, from Fleet, has helped in defining the support matrix in terms of what OSs and architectures need to be run for Fleet test suite.
Julien Mailleret, from Infra, has contributed to the Helm charts test suite.
Anderson Queiroz (Elastic Agent) and Víctor Martínez (Observability Robots), are currently working on the MacOS support for running the tests on real Apple machines using Elastic's Orka provisioner.

Licensing

This project is licensed under the Elastic License: https://www.elastic.co/licensing/elastic-license

e2e-testing's People

Contributors

Stargazers

Watchers

e2e-testing's Issues

Represent the state of the running services in a standard file

We will help teams to understand what is running and where

Document how to read services on each team (possibly language)

Upgrade outdated dependencies

i.e docker client v0.7.3-0.20190506211059-b20a14b54661
i.e. DATA-DOG/godog to cucumber/godog

Check services status when running them

I can think of improving the runMetricbeat method, to check/wait for liveness of the service, and if it's not ready, then throw an error, failing the test at that step.

Originally posted by @mdelapenya in #103 (comment)

That would also apply to any other service run by the framework

Remove build context YAML section from fetched docker-compose files

They are causing many problems, as the first docker-compose file must have access to the location of each _meta directory in Beats.

Related issues

Add retry logic for any "docker pull" command

The release pipeline is failing from time to time when the connectivity with github is flaky

See https://apm-ci.elastic.co/blue/organizations/jenkins/beats%2Fpoc-metricbeat-tests-poc-mbp/detail/v0.1.0-rc11/1/pipeline/176

Use integrations-ci docker registry for integrations

quick-win: this will be supersed by #79, but it's a a feature we want to have it now to make integrations work again.

The versions uses the beats- prefix, and the -1 suffix (1 because it's our version)

See https://github.com/elastic/beats/blob/master/metricbeat/module/$MODULE/docker-compose.yml

Support running tests in parallel

We'd like to run a parallel execution per each integration: apache, redis, mysql, etc

Running tests as the CI in Mac does not work because of the test-dependencies

The kubectl and helm binaries are always installed for linux

Rename repo to e2e-testing

It will need a big refactor on Go code, as the packages must be renamed in consequence, affecting opened PRs

Use testcontainers in our fork to use docker-compose

As we are not receiving feedback about the docker-compose implementation, let's continue with the fork, and go back to the library if/when they merge our proposal

Get latest version of the stack in an automated manner

It's possible to query an endpoint and get the latest artefact for the stack and for metricbeat.

We are already doing it on CI side, so let's copy the behavior and implement it here in Go

Originally posted by @mdelapenya in #84

Enhance junit ouput

Current jUnit output returns one single line of the command execution. We'd like to have a more elaborated context about the test execution, so that Jenkins traditional UI shows more information to the developer.

Support for vsphere

As we found a Docker image that mocks vSphere API's, we could use it here for running the end-to-end tests.

Create a DSL to simplify the creation of ES queries

Add functional tests for rabbitmq

As part of the demo, we are going to create tests for a new service integration

Generate the feature files that are related to the beats integration modules

We have fixed feature files for apache, mysql, redis and vsphere, where we manually update the supported versions as examples in the gherkin table. To keep this in sync with the real supported versions, which are defined in the Beats repo, in the supported-versions.yml file for the module.

We could generate the files at build time using templates: one for modules without variants (i.e. apache) and one for modules with variants (i.e. mysql). The feature files are exactly equals but the module name and the examples.

-@apache
+@redis
-Feature: As a Metricbeat developer I want to check that the Apache module works as expected
+Feature: As a Metricbeat developer I want to check that the Redis module works as expected

Scenario Outline: Check module is sending metrics to Elasticsearch without errors
-  Given Apache "<apache_version>" is running for metricbeat
+  Given Redis "<redis_version>" is running for metricbeat
-    And metricbeat is installed and configured for Apache module
+    And metricbeat is installed and configured for Redis module
    And metricbeat waits "20" seconds for the service
  When metricbeat runs for "20" seconds 
-  Then there are "Apache" events in the index
+  Then there are "Redis" events in the index
    And there are no errors in the index
Examples:
-| apache_version |
+| redis_version |
-| 2.4.12         |
-| 2.4.20         |
+| 3.2.12         |
+| 4.0.11         |
+| 5.0.5          |

I think we can commit the templates and ignore the feature files, which will provide an automated manner of testing both new and existing integrations in an automated manner.

@jsoriano what are your thoughts about this?

Typo in go mod test dependencies

Make metricbeat pooling time configurable

It's hardcoded at a service's default config file, and we'd like to make it configurable

Release artefacts are not properly archived on Jenkins

The build process is storing the binaries here:

[2019-08-20T09:12:32.051Z] + docker run --rm -v /var/lib/jenkins/workspace/_metricbeat-tests-poc-mbp_master/src/github.com/elastic/metricbeat-tests-poc/cli:/usr/local/go/src/github.com/elastic/op -w /usr/local/go/src/github.com/elastic/op -e GOOS=darwin -e GOARCH=amd64 golang:1.12.7 go build -v -o /usr/local/go/src/github.com/elastic/op/.github/releases/download/0.1.0-rc1/darwin64-op

But the Jenkins' archiving step is trying to find them here:

[2019-08-20T09:13:47.173Z] ‘src/github.com/elastic/metricbeat-tests-poc/.github/releases/download/**’ doesn’t match anything, but ‘**’ does. Perhaps that’s what you mean?

More information at this build: https://apm-ci.elastic.co/blue/organizations/jenkins/beats%2Fpoc-metricbeat-tests-poc-mbp/detail/master/47/pipeline/154

Document the functional-test CLI

Add vsphere tests to the CI

Fix packr builder image

Reuse beats' docker-compose files in the PoC

We want a way to get the latest docker compose files until this is merged into the Beats repo

Because supported versions are hardcoded in the PoC (see Gherkin tables) we have to think about a manner to keep those versions in sync with what comes from the supported versions file from an integration. We'd like to avoid the overwork of updating the feature files each time a version appears/disappears/is updated.

Small refactors

Keep runner as slim as possible

Exception when running the CLI if the configuration folder exists but the configuration file doesn't

Steps to reproduce:

Remove configuration file rm ~/.op/config.yml
Run the tool go run main.go -h

Expected behaviour: the config file is created
Actual behaviour:

panic: assignment to entry in nil map

goroutine 1 [running]:
github.com/elastic/metricbeat-tests-poc/cli/config.checkServices(0x0, 0x0)
        /Users/mdelapenya/sourcecode/src/github.com/elastic/metricbeat-tests-poc/cli/config/config.go:362 +0x1b2
github.com/elastic/metricbeat-tests-poc/cli/config.readConfig(0xc000243dc0, 0x15, 0x11d747b, 0x1b64d20, 0x1c90460, 0x0)
        /Users/mdelapenya/sourcecode/src/github.com/elastic/metricbeat-tests-poc/cli/config/config.go:444 +0x190
github.com/elastic/metricbeat-tests-poc/cli/config.newConfig(0xc000243dc0, 0x15)
        /Users/mdelapenya/sourcecode/src/github.com/elastic/metricbeat-tests-poc/cli/config/config.go:303 +0x78
github.com/elastic/metricbeat-tests-poc/cli/config.InitConfig()
        /Users/mdelapenya/sourcecode/src/github.com/elastic/metricbeat-tests-poc/cli/config/config.go:266 +0xab
github.com/elastic/metricbeat-tests-poc/cli/cmd.init.0()
        /Users/mdelapenya/sourcecode/src/github.com/elastic/metricbeat-tests-poc/cli/cmd/run.go:16 +0x37
exit status 2

Rephrase existing feature files to include a When

This When clause indicates to a casual reader the action that triggers the assertion performed in the Then one, so it's important to have one.

For the existing features, our current action (the implicit When) is to leave metricbeat running for 20 seconds (configurable). So we could rephrase the feature files like this:

Before:

  Given Apache "<apache_version>" is running for metricbeat
    And metricbeat is installed and configured for Apache module
  Then there are "Apache" events in the index
    And there are no errors in the index

After:

  Given Apache "<apache_version>" is running for metricbeat
    And metricbeat is installed and configured for Apache module
  When metricbeat is running for "20" seconds
  Then there are "Apache" events in the index
    And there are no errors in the index

Another way of achieving this could be using a document-based approach, polling elasticsearch until it has a certain number of documents into it:

  Given Apache "<apache_version>" is running for metricbeat
    And metricbeat is installed and configured for Apache module
//  When metricbeat has sent "20" documents
//  When the index contains at least "20" documents
  Then there are "Apache" events in the index
    And there are no errors in the index

Related issues

Relates #45

Automate changelog when releasing

Include errors when checking that there are documents

If the ES query is more restrictive, adding the service.version field, then no documents will be found, although they could be errors.

Run smoke functional tests per PR

It would be great if we can run a set of smoke tests for each PR in this repo, leveraging the functional-test.sh script, which support tags.

It could be great running each subset in parallel 🤔

Upgrade testcontainers

They added support for multi-waitFor strategy

Use a unique name for the index per test case

We want to avoid collisions between test cases

Use testcontainers-go upstream

They merged docker-compose!!!

Use Beats utilities to copy files and directories

The io.go file could be modified to consume Beats utilities directly

Originally posted by @mdelapenya in #84

Define matrix compatibility for metricbeat and the elastic stack outside the test framework

We want the automation tool to define the different combination of input parameters to generate different scenarios:

Stack 7.4 + metricbeat 7.4 + tests for apache (2.2, 2.4)
Stack 7.5-SNAPSHOT + metricbeat 7.5-SNAPSHOT + tests for apache (2.2, 2.4)
Stack 7.5-SNAPSHOT + metricbeat 7.4 + tests for apache (2.2, 2.4)

The test framework could read stack and metricbeat versions from the environment (config test file, environment variable, etc), keeping just the versions of the integration module in the feature file.

For local development, we must ensure a default value for those two arguments too.

cc/ @elastic/observablt-robots

Validate Helm charts

Let's use a BDD approach to validate the official Helm charts for elastic. Something like this:

@helm
@k8s
@metricbeat
Feature: The Helm chart is following product recommended configuration for Kubernetes

Scenario: The Metricbeat chart will create recommended K8S resources
  Given a cluster is running
  When the "metricbeat" Elastic's helm chart is installed
  Then a pod will be deployed on each node of the cluster by a DaemonSet
    And a "Deployment" will manage additional pods for metricsets querying internal services
    And a "kube-state-metrics" chart will retrieve specific Kubernetes metrics
    And a "ConfigMap" resource contains the "metricbeat.yml" content
    And a "ConfigMap" resource contains the "kube-state-metrics-metricbeat.yml" content
    And a "ServiceAccount" resource manages RBAC
    And a "ClusterRole" resource manages RBAC
    And a "ClusterRoleBinding" resource manages RBAC

List existing elasticsearch indices on errors

When no documents are found, list indices

Port over one of the stack monitoring parity tests

Currently we have stack monitoring parity tests running in the Elastic Stack Testing Framework (ESTF). You can read more about these tests, particularly why they were created and how they work conceptually, over here: https://github.com/elastic/elastic-stack-testing/blob/master/playbooks/monitoring/README.md.

These parity tests are broken up by each stack product that shows up in the Stack Monitoring UI, viz. elasticsearch, logstash, kibana, and beats. This division is reflected in the sub-folders seen under https://github.com/elastic/elastic-stack-testing/tree/master/playbooks/monitoring.

It would be great to try and port over one of these product's parity tests over to the e2e-testing framework, mainly to see if there are any performance or maintainability gains in the process. To that end, I'd suggest porting over the parity tests for beats or logstash, as these are the least complex ones.

Fix golangci-lint

It's failing on CI because it uses root directory of the project to compile and look for dependencies.

As we have CLI source code in a subdir, then the precommit checks fails. So do for the metricbeat-tests

Inverted logic when using testcontainers' reaper

After latest upgrade, where they fixed a bug for the resource reaper, running services with the CLI creates a reaper container to kill it, and test runner does not, so the logic must be inverted.

Add test for included default configuration

Beats packages include generated configurations, we should add some test to check that this configuration works to avoid regressions like this one elastic/beats#13705

A way to check it for metricbeat would be to start metricbeat with default configuration (the one in metricbeat/metricbeat.yml overriding the outputs with -E) and check that metricbeat starts and collects events from the system module.

Use official version of Godog after being adopted by Cucumber

Now the project is hosted under https://github.com/cucumber/godog instead of https://github.com/DATA-DOG/godog

metricbeat configuration tests are failing (not running in the CI)

--- Failed steps:

  Scenario Outline: Check metricbeat configuration is sending metrics to Elasticsearch without errors # features/metricbeat/metricbeat.feature:3
    Then there are "system" events in the index # features/metricbeat/metricbeat.feature:6
      Error: There aren't documents for - on Metricbeat index

  Scenario Outline: Check metricbeat.docker configuration is sending metrics to Elasticsearch without errors # features/metricbeat/metricbeat.feature:3
    Then there are "system" events in the index # features/metricbeat/metricbeat.feature:6
      Error: There aren't documents for - on Metricbeat index

  Scenario Outline: Check metricbeat.reference configuration is sending metrics to Elasticsearch without errors # features/metricbeat/metricbeat.feature:3
    Then there are "system" events in the index # features/metricbeat/metricbeat.feature:6
      Error: Could not send query to Elasticsearch in the specified time (15 seconds)

Support for Kubernetes provisioning

Nowadays, the only provisioner is docker-compose.

This will allow running tests agains a k8s cluster, i.e. the tests clusters, and the Helm charts used to create each piece in the cluster: filebeat, apm-server, etc.

cc/ @jsoriano @elastic/observablt-robots

Support tuning the wait times out of the feature file

As per #76 (comment), we'd like to avoid exposing a setting in a when step which, although configurable, makes it more difficult to a user to change it in a dynamic manner.

CLI's stop command not finding services when using the binary instead of source code

Discovered while testing #70

Fix generation of Go binaries

We noticed that the binaries generated by the current build process are not including the static assets, feature that was leveraged by GoBuffalo's packer v2.

In the build script we are currently using a docker image that includes packr2 as build tool, but after debugging the build process, we found that no packr boxes were found.

It's important to include the static assets, because it will add default values for services and stacks to be run using both the provisioner tool and the Godog tests.

For that reason we need to include the assets into the final Go binary.

Here we have a successful build: https://apm-ci.elastic.co/blue/organizations/jenkins/beats%2Fpoc-metricbeat-tests-poc-mbp/detail/v0.1.0-rc7/1/pipeline/151 and its artifacts https://apm-ci.elastic.co/blue/organizations/jenkins/beats%2Fpoc-metricbeat-tests-poc-mbp/detail/v0.1.0-rc7/1/artifacts

If you download any of the binaries (i.e. Darwin), change permissions to make it executable, and execute it, it should include assets.

Steps to reproduce (for Mac)

Download the binary: curl https://apm-ci.elastic.co/job/beats/job/poc-metricbeat-tests-poc-mbp/job/v0.1.0-rc7/1/artifact/src/github.com/elastic/metricbeat-tests-poc/cli/.github/releases/download/0.1.0-rc7/darwin64-op -o /tmp/op
Change permissions: chmod +x /tmp/op
Move: cd /tmp
4 Execute: ./op run service -h

Expected behavior: All included services are present

./op run service -h
Allows to run a service, defined as subcommands, spinning up Docker containers for them and exposing their internal configuration so that you are able to connect to them in an easy manner

Usage:
  op run service [flags]
  op run service [command]

Available Commands:
  apache        Runs a apache service
  apm-server    Runs a apm-server service
  elasticsearch Runs a elasticsearch service
  kafka         Runs a kafka service
  kibana        Runs a kibana service
  metricbeat    Runs a metricbeat service
  mongodb       Runs a mongodb service
  mysql         Runs a mysql service
  opbeans-go    Runs a opbeans-go service
  opbeans-java  Runs a opbeans-java service
  redis         Runs a redis service
  vsphere       Runs a vsphere service

Flags:
  -h, --help   help for service

Use "op run service [command] --help" for more information about a command.

Actual behavior: No service is present

./op run service -h
Allows to run a service, defined as subcommands, spinning up Docker containers for them and exposing their internal configuration so that you are able to connect to them in an easy manner

Usage:
  op run service [flags]
  op run service [command]

Flags:
  -h, --help   help for service

Use "op run service [command] --help" for more information about a command.