efficientgo / e2e Goto Github PK

Robust framework for running complex workload scenarios in isolation, using Go; for integration, e2e tests, benchmarks and more! 💪

License: Apache License 2.0

Makefile 4.01% Go 95.99%

testing go golang e2e

e2e's Introduction

e2e

Go Module providing robust framework for running complex workload scenarios in isolation, using Go and Docker. For integration, e2e tests, benchmarks and more! 💪

What are the goals?

Ability to schedule isolated processes programmatically from a single process on a single machine.
Focus on cluster workloads, cloud native services and microservices.
Developer scenarios in mind - e.g. preserving scenario readability and integration with the Go test ecosystem.
Metric monitoring as the first citizen. Assert on Prometheus metric values during test scenarios or check overall performance characteristics.

Usage Models

There are three main use cases envisioned for this Go module:

e2e test use (see example). Use e2e in e2e tests to quickly run complex test scenarios involving many container services. This was the main reason we created this module. You can check usage of it in Cortex and Thanos projects.
Standalone use (see example). Use e2e to run setups in interactive mode where you spin up workloads as you want programmatically and poke with it on your own using your browser or other tools. No longer need to deploy full Kubernetes or external machines.
Benchmark use. Use e2e in local Go benchmarks when your code depends on external services with ease.

Getting Started

Let's go through an example leveraging the go test flow:

Get the e2e Go module to your go.mod using go get github.com/efficientgo/e2e.
Implement a test. Start by creating an environment. Currently e2e supports Docker environment only. Use a unique name for all of your tests. It's recommended to keep it stable so resources are consistently cleaned.
```
	// Start isolated environment with given ref.
	e, err := e2e.New()
	testutil.Ok(t, err)
	// Make sure resources (e.g docker containers, network, dir) are cleaned.
	t.Cleanup(e.Close)
```

Implement the workload by creating e2e.Runnable. Or you can use existing runnables in the e2edb package. For example implementing a function that schedules Jaeger with our desired configuration could look like this:

	j := e.Runnable("tracing").
		WithPorts(
			map[string]int{
				"http.front":    16686,
				"jaeger.thrift": 14268,
			}).
		Init(e2e.StartOptions{Image: "jaegertracing/all-in-one:1.25"})

Use e2emon.AsInstrumented if you want to be able to query your service for metrics, which is a great way to assess it's internal state in tests! For example see following Etcd definition:

	return e2emon.AsInstrumented(env.Runnable(name).WithPorts(map[string]int{AccessPortName: 2379, "metrics": 9000}).Init(
		e2e.StartOptions{
			Image: o.image,
			Command: e2e.NewCommand(
				"/usr/local/bin/etcd",
				"--listen-client-urls=http://0.0.0.0:2379",
				"--advertise-client-urls=http://0.0.0.0:2379",
				"--listen-metrics-urls=http://0.0.0.0:9000",
				"--log-level=error",
			),
			Readiness: e2e.NewHTTPReadinessProbe("metrics", "/health", 200, 204),
		},
	), "metrics")

Program your scenario as you want. You can start, wait for their readiness, stop, check their metrics and use their network endpoints from both unit test (Endpoint) as well as within each workload (InternalEndpoint). You can also access workload directory. There is a shared directory across all workloads. Check Dir and InternalDir runnable methods.

	// Create structs for Prometheus containers scraping itself.
	p1 := e2edb.NewPrometheus(e, "prometheus-1")
	s1 := e2edb.NewThanosSidecar(e, "sidecar-1", p1)

	p2 := e2edb.NewPrometheus(e, "prometheus-2")
	s2 := e2edb.NewThanosSidecar(e, "sidecar-2", p2)

	// Create Thanos Query container. We can point the peer network addresses of both Prometheus instance
	// using InternalEndpoint methods, even before they started.
	t1 := e2edb.NewThanosQuerier(e, "query-1", []string{s1.InternalEndpoint("grpc"), s2.InternalEndpoint("grpc")})

	// Start them.
	testutil.Ok(t, e2e.StartAndWaitReady(p1, s1, p2, s2, t1))

	// To ensure query should have access we can check its Prometheus metric using WaitSumMetrics method. Since the metric we are looking for
	// only appears after init, we add option to wait for it.
	testutil.Ok(t, t1.WaitSumMetricsWithOptions(e2emon.Equals(2), []string{"thanos_store_nodes_grpc_connections"}, e2emon.WaitMissingMetrics()))

	// To ensure Prometheus scraped already something ensure number of scrapes.
	testutil.Ok(t, p1.WaitSumMetrics(e2emon.Greater(50), "prometheus_tsdb_head_samples_appended_total"))
	testutil.Ok(t, p2.WaitSumMetrics(e2emon.Greater(50), "prometheus_tsdb_head_samples_appended_total"))

	// We can now query Thanos Querier directly from here, using it's host address thanks to Endpoint method.
	a, err := api.NewClient(api.Config{Address: "http://" + t1.Endpoint("http")})

Interactive

It is often the case we want to pause e2e test in a desired moment, so we can manually play with the scenario in progress. This is as easy as using the e2einteractive package to pause the setup until you enter the printed address in your browser. Use the following code to print the address to hit and pause until it's getting hit.

err := e2einteractive.RunUntilEndpointHit()

Monitoring

Each instrumented workload (runnable wrapped with e2emon.AsInstrumented) have programmatic access to the latest metrics with WaitSumMetricsWithOptions methods family. Yet, especially for standalone mode it's often useful to query and visualise all metrics provided by your services/runnables using PromQL. In order to do so just start monitoring from e2emon package:

mon, err := e2emon.Start(e)

This will start Prometheus with automatic discovery for every new and old instrumented runnables. It also runs cadvisor that monitors docker itself if env.DockerEnvironment is started and shows generic performance metrics per container (e.g container_memory_rss). Run OpenUserInterfaceInBrowser() to open the Prometheus UI in the browser:

	// Open monitoring page with all metrics.
	if err := mon.OpenUserInterfaceInBrowser(); err != nil {
		return errors.Wrap(err, "open monitoring UI in browser")
	}

To see how it works in practice, run our example code in standalone.go by running make run-example. At the end, four UIs should show in your browser:

Thanos one,
Monitoring (Prometheus)
Profiling (Parca)
Tracing (Jaeger).

In the monitoring UI you can then e.g. query docker container metrics using container_memory_working_set_bytes{id!="/"} metric:

NOTE: Due to cgroup modifications and using advanced docker features, this might behave different on non-Linux platforms. Let us know in the issue if you encounter any issue on Mac or Windows and help us to add support for those operating systems!

Bonus: Monitoring performance of e2e process itself.

It's common pattern that you want to schedule some containers but also, you might want to monitor a local code you just wrote. For this you can run your local code in an ad-hoc container using e2e.Containerize():

	l, err := e2e.Containerize(e, "run", Run)
	testutil.Ok(t, err)

	testutil.Ok(t, e2e.StartAndWaitReady(l))

While having the Run function in a separate non-test file. The function must be exported, for example:

func Run(ctx context.Context) error {
	// Do something.

	<-ctx.Done()
	return nil
}

This will run your code in a container allowing to use the same monitoring methods thanks to cadvisor.

Continuous Profiling

Similarly to Monitoring, you can wrap your runnable (or instrumented runnable) with e2eprof.AsProfiled if your service uses HTTP pprof handlers (common in Go). When wrapped, you can start continuous profiler using e2eprof package:

prof, err := e2eprof.Start(e)

This will start Parca with automatic discovery for every new and old profiled runnables. Run OpenUserInterfaceInBrowser() to open the Parca UI in the browser:

	// Open profiling page with all profiles.
	if err := prof.OpenUserInterfaceInBrowser(); err != nil {
		return errors.Wrap(err, "open profiling UI in browser")
	}

To see how it works in practice, run our example code in standalone.go by running make run-example. At the end, four UIs should show in your browser:

Thanos one,
Monitoring (Prometheus)
Profiling (Parca)
Tracing (Jaeger).

In the profiling UI choose a profile type, filter by instances (autocompleted) and select the profile:

Monitoring + Profiling

For runnables that are both instrumented and profiled you can use e2eobs.AsObservable.

Debugging flaky tests

Sometimes tests might fail due to timing problems on highly CPU constrained systems such as GitHub actions. To facilitate fixing these issues, e2e supports limiting CPU time allocated to Docker containers through E2E_DOCKER_CPUS environment variable:

	dockerCPUsEnv := os.Getenv(dockerCPUEnvName)
	if dockerCPUsEnv != "" {
		dockerCPUsParam = dockerCPUsEnv
	}

You can set it either through command line parameters or t.Setenv("E2E_DOCKER_CPUS", "...").

Alternatively, you could pass WithCPUs through environment options so that some e2e test would have permanently reduced available CPU time.

See what values you can pass to the --cpus flag on Docker website.

Troubleshooting

Can't create docker network

If you see an output like the one below:

18:09:11 dockerEnv: [docker ps -a --quiet --filter network=kubelet]
18:09:11 dockerEnv: [docker network ls --quiet --filter name=kubelet]
18:09:11 dockerEnv: [docker network create -d bridge kubelet]
18:09:11 Error response from daemon: could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network

The first potential reasons is that this command often does not work if you have VPN client working like openvpn, expressvpn, nordvpn etc. Unfortunately the fastest solution is to turn off the VPN for the duration of test.

If that is not the reason, consider pruning your docker networks. You might have leftovers from previous runs (although in successful runs, e2e cleans those).

Use docker network prune -f to clean those.

Credits

Initial Authors: @pracucci, @bwplotka, @pstibrany
Cortex Team hosting previous form of this module initially.

e2e's People

Contributors

Stargazers

Watchers

e2e's Issues

Interrupting in standalone mode propagates to docker containers (?)

Repro:

make run-example
Ctrl+C

Logs (after interrupt):

^C14:48:03 Killing query-1
14:48:03 sidecar-2: level=info name=sidecar-2 ts=2021-07-24T11:48:03.676445174Z caller=main.go:167 msg="caught signal. Exiting." signal=interrupt
14:48:03 sidecar-2: level=warn name=sidecar-2 ts=2021-07-24T11:48:03.676527331Z caller=intrumentation.go:54 msg="changing probe status" status=not-ready reason=null
14:48:03 sidecar-2: level=info name=sidecar-2 ts=2021-07-24T11:48:03.676541775Z caller=http.go:74 service=http/server component=sidecar msg="internal server is shutting down" err=null
14:48:03 sidecar-1: level=info name=sidecar-1 ts=2021-07-24T11:48:03.676619483Z caller=main.go:167 msg="caught signal. Exiting." signal=interrupt
14:48:03 sidecar-1: level=warn name=sidecar-1 ts=2021-07-24T11:48:03.676682445Z caller=intrumentation.go:54 msg="changing probe status" status=not-ready reason=null
14:48:03 sidecar-1: level=info name=sidecar-1 ts=2021-07-24T11:48:03.676695729Z caller=http.go:74 service=http/server component=sidecar msg="internal server is shutting down" err=null
14:48:03 sidecar-2: level=info name=sidecar-2 ts=2021-07-24T11:48:03.677752224Z caller=http.go:93 service=http/server component=sidecar msg="internal server is shutdown gracefully" err=null
14:48:03 sidecar-2: level=info name=sidecar-2 ts=2021-07-24T11:48:03.677809395Z caller=intrumentation.go:66 msg="changing probe status" status=not-healthy reason=null
14:48:03 sidecar-2: level=warn name=sidecar-2 ts=2021-07-24T11:48:03.677847401Z caller=intrumentation.go:54 msg="changing probe status" status=not-ready reason=null
14:48:03 sidecar-2: level=info name=sidecar-2 ts=2021-07-24T11:48:03.677857689Z caller=grpc.go:130 service=gRPC/server component=sidecar msg="internal server is shutting down" err=null
14:48:03 sidecar-2: level=info name=sidecar-2 ts=2021-07-24T11:48:03.677875199Z caller=grpc.go:143 service=gRPC/server component=sidecar msg="gracefully stopping internal server"
14:48:03 sidecar-1: level=info name=sidecar-1 ts=2021-07-24T11:48:03.677912421Z caller=http.go:93 service=http/server component=sidecar msg="internal server is shutdown gracefully" err=null
14:48:03 sidecar-1: level=info name=sidecar-1 ts=2021-07-24T11:48:03.677972702Z caller=intrumentation.go:66 msg="changing probe status" status=not-healthy reason=null
14:48:03 sidecar-1: level=warn name=sidecar-1 ts=2021-07-24T11:48:03.67801026Z caller=intrumentation.go:54 msg="changing probe status" status=not-ready reason=null
14:48:03 sidecar-1: level=info name=sidecar-1 ts=2021-07-24T11:48:03.678022172Z caller=grpc.go:130 service=gRPC/server component=sidecar msg="internal server is shutting down" err=null
14:48:03 sidecar-1: level=info name=sidecar-1 ts=2021-07-24T11:48:03.678038023Z caller=grpc.go:143 service=gRPC/server component=sidecar msg="gracefully stopping internal server"
14:48:03 sidecar-1: level=info name=sidecar-1 ts=2021-07-24T11:48:03.678369251Z caller=grpc.go:156 service=gRPC/server component=sidecar msg="internal server is shutdown gracefully" err=null
14:48:03 sidecar-1: level=info name=sidecar-1 ts=2021-07-24T11:48:03.678437559Z caller=main.go:159 msg=exiting
14:48:03 sidecar-2: level=info name=sidecar-2 ts=2021-07-24T11:48:03.678758319Z caller=grpc.go:156 service=gRPC/server component=sidecar msg="internal server is shutdown gracefully" err=null
14:48:03 sidecar-2: level=info name=sidecar-2 ts=2021-07-24T11:48:03.678797963Z caller=main.go:159 msg=exiting
14:48:03 prometheus-1: level=warn ts=2021-07-24T11:48:03.695Z caller=main.go:653 msg="Received SIGTERM, exiting gracefully..."
14:48:03 prometheus-1: level=info ts=2021-07-24T11:48:03.696Z caller=main.go:676 msg="Stopping scrape discovery manager..."
14:48:03 prometheus-1: level=info ts=2021-07-24T11:48:03.696Z caller=main.go:690 msg="Stopping notify discovery manager..."
14:48:03 prometheus-1: level=info ts=2021-07-24T11:48:03.696Z caller=main.go:712 msg="Stopping scrape manager..."
14:48:03 prometheus-1: level=info ts=2021-07-24T11:48:03.696Z caller=main.go:686 msg="Notify discovery manager stopped"
14:48:03 prometheus-1: level=info ts=2021-07-24T11:48:03.696Z caller=main.go:672 msg="Scrape discovery manager stopped"
14:48:03 prometheus-1: level=info ts=2021-07-24T11:48:03.696Z caller=main.go:706 msg="Scrape manager stopped"
14:48:03 prometheus-1: level=info ts=2021-07-24T11:48:03.696Z caller=manager.go:934 component="rule manager" msg="Stopping rule manager..."
14:48:03 prometheus-1: level=info ts=2021-07-24T11:48:03.696Z caller=manager.go:944 component="rule manager" msg="Rule manager stopped"
14:48:03 query-1: level=info name=query-1 ts=2021-07-24T11:48:03.697417989Z caller=main.go:167 msg="caught signal. Exiting." signal=interrupt
14:48:03 query-1: level=warn name=query-1 ts=2021-07-24T11:48:03.697765153Z caller=intrumentation.go:54 msg="changing probe status" status=not-ready reason=null
14:48:03 query-1: level=info name=query-1 ts=2021-07-24T11:48:03.697813969Z caller=http.go:74 service=http/server component=query msg="internal server is shutting down" err=null
14:48:03 prometheus-2: level=warn ts=2021-07-24T11:48:03.697Z caller=main.go:653 msg="Received SIGTERM, exiting gracefully..."
14:48:03 prometheus-2: level=info ts=2021-07-24T11:48:03.697Z caller=main.go:676 msg="Stopping scrape discovery manager..."
14:48:03 prometheus-2: level=info ts=2021-07-24T11:48:03.697Z caller=main.go:690 msg="Stopping notify discovery manager..."
14:48:03 prometheus-2: level=info ts=2021-07-24T11:48:03.697Z caller=main.go:712 msg="Stopping scrape manager..."
14:48:03 prometheus-2: level=info ts=2021-07-24T11:48:03.697Z caller=main.go:672 msg="Scrape discovery manager stopped"
14:48:03 prometheus-2: level=info ts=2021-07-24T11:48:03.697Z caller=main.go:686 msg="Notify discovery manager stopped"
14:48:03 prometheus-2: level=info ts=2021-07-24T11:48:03.697Z caller=manager.go:934 component="rule manager" msg="Stopping rule manager..."
14:48:03 prometheus-2: level=info ts=2021-07-24T11:48:03.697Z caller=manager.go:944 component="rule manager" msg="Rule manager stopped"
14:48:03 prometheus-2: level=info ts=2021-07-24T11:48:03.697Z caller=main.go:706 msg="Scrape manager stopped"
14:48:03 query-1: level=info name=query-1 ts=2021-07-24T11:48:03.699077457Z caller=http.go:93 service=http/server component=query msg="internal server is shutdown gracefully" err=null
14:48:03 query-1: level=info name=query-1 ts=2021-07-24T11:48:03.699157713Z caller=intrumentation.go:66 msg="changing probe status" status=not-healthy reason=null
14:48:03 query-1: level=warn name=query-1 ts=2021-07-24T11:48:03.699192767Z caller=intrumentation.go:54 msg="changing probe status" status=not-ready reason=null
14:48:03 query-1: level=info name=query-1 ts=2021-07-24T11:48:03.699204094Z caller=grpc.go:130 service=gRPC/server component=query msg="internal server is shutting down" err=null
14:48:03 query-1: level=info name=query-1 ts=2021-07-24T11:48:03.699233377Z caller=grpc.go:143 service=gRPC/server component=query msg="gracefully stopping internal server"
14:48:03 query-1: level=info name=query-1 ts=2021-07-24T11:48:03.699338349Z caller=grpc.go:156 service=gRPC/server component=query msg="internal server is shutdown gracefully" err=null
14:48:03 query-1: level=info name=query-1 ts=2021-07-24T11:48:03.699371953Z caller=main.go:159 msg=exiting
14:48:03 prometheus-1: level=info ts=2021-07-24T11:48:03.703Z caller=notifier.go:601 component=notifier msg="Stopping notification manager..."
14:48:03 prometheus-1: level=info ts=2021-07-24T11:48:03.703Z caller=main.go:885 msg="Notifier manager stopped"
14:48:03 prometheus-1: level=info ts=2021-07-24T11:48:03.703Z caller=main.go:897 msg="See you next time!"
14:48:03 prometheus-2: level=info ts=2021-07-24T11:48:03.710Z caller=notifier.go:601 component=notifier msg="Stopping notification manager..."
14:48:03 prometheus-2: level=info ts=2021-07-24T11:48:03.710Z caller=main.go:885 msg="Notifier manager stopped"
14:48:03 prometheus-2: level=info ts=2021-07-24T11:48:03.711Z caller=main.go:897 msg="See you next time!"
14:48:04 Killing sidecar-2
14:48:04 Error response from daemon: Cannot kill container: e2e_example-sidecar-2: No such container: e2e_example-sidecar-2

14:48:04 Unable to kill service sidecar-2 : exit status 1
14:48:04 Killing prometheus-2
14:48:04 Error response from daemon: Cannot kill container: e2e_example-prometheus-2: No such container: e2e_example-prometheus-2

14:48:04 Unable to kill service prometheus-2 : exit status 1
14:48:04 Killing sidecar-1
14:48:04 Error response from daemon: Cannot kill container: e2e_example-sidecar-1: No such container: e2e_example-sidecar-1

14:48:04 Unable to kill service sidecar-1 : exit status 1
14:48:04 Killing prometheus-1
14:48:04 Error response from daemon: Cannot kill container: e2e_example-prometheus-1: No such container: e2e_example-prometheus-1

14:48:04 Unable to kill service prometheus-1 : exit status 1
2021/07/24 14:48:04 received signal interrupt
exit status 1
make: *** [Makefile:78: run-example] Interrupt

BuildArgs should support repeating arguments

It is not uncommon that some programs support repeating arguments to provide multiple values, i.e. in following format:
example -p "first argument" -p "second one" -p "third one"

It is currently not possible to use BuildArgs to build arguments in such way, since it depends on using map[string]string, thus not allowing for repeated values.

idea: Declarative K8s API as the API for docker env.

Just an idea: But it would be amazing to contain some service like e2e.Runnable or instrumented e2e.Runnable in a declarative, mutable state. Ideally, something that speaks a common language like K8s APIs. Then have docker engine supporting an important subset of K8s API for local use. There would be a few benefits to this:

We would be able to compose adjustments of e.g flags for different tests together better like Jsonnet allows (also adds huge cognitive load potentially!). The current approach has similar issues to https://github.com/bwplotka/mimic initial deployment at Improbable - the input for adjusting services is getting out of control (check ruler or querier helpers for e.g thanos-io/thanos#5348)
We could REUSE some Infrastructure as Go code (e.g. https://github.com/bwplotka/mimic) for both productions, staging, testing etc K8s clusters AS WELL AS local simplified e2e docker environments!

Matchers package cannot be used since it is internal

I would like to use the metrics option WithLabelMatchers, however I am unable to construct the matcher since the compiler will complain about this package being internal.

Is this intentional for some reason or just an oversight?

Possibility of infinite loop when waiting for missing metrics to appear

When using InstrumentedRunnable.WaitSumMetricsWithOptions(...) together with e2e.WaitMissingMetrics() it's possible that tests get stuck forever because of this loop.

It would be great to have a safety mechanism to timeout the wait. It helps avoid long running CI jobs, which can be $$$ friendly.

monitoring: Add option to disable cadvisor

Cadvisor is important to get container metrics. However, it requires various directories, which might be different on different OS-es causing it to fail. For example when using WSL (google/cadvisor#2648 (comment)).

Without cadvisor we still can have a lot of metrics from runtimes running in containers (e.g Go app), so we can add option to disable cadvisor so users can be unblocked if they can't make cadvisor running.

Add GC to unused docker networks.

We could make test cleanup more reliable following a similar pattern as in https://golang.testcontainers.org/features/garbage_collector/

Perhaps one container that deletes the whole network? (no need for sidecars).

What would be enough is to on the next run kill all unused networks and find a way how to check if network used used (perhaps some expiry files)

Getting Dir & InternalDir mixed up - is there a better way?

Knowing when to use Dir & InternalDir is confusing and getting them mixed up can lead to file permission issues when your containers start up.

For example, when trying to create a dir called test in the container:

if err := os.MkdirAll(filepath.Join(demo.InternalDir(), "test"), os.ModePerm); err != nil {
		return e2e.NewErrInstrumentedRunnable(name, errors.Wrap(err, "create test dir failed"))
	}

leads to the following when run

   unexpected error: create logs dir failed: mkdir /shared: permission denied

You receive that error message when the test is running & the containers have started up, so naturally you think that the error is coming from within the container, when in actual fact it is failing because the process can't create the /shared directory on your local machine.

Is there a better way of doing this? or preventing this kind of confusing error message from the caller's?

Permissions of DockerEnvironment.SharedDir()

I had several hours of confusion and difficulty because on my test machine the Docker instances received a /shared directory (holding /shared/config etc) with permissions rwxr-xr-x but on a CircleCI machine running a PR the Docker instances saw permissions rwx------ for /shared.

(This affects test containers that don't run as root.)

It is unclear to me if the problem is that only I am using Docker on a Mac, I am using Go 1.17, or I have a different umask than the CircleCI machine. I tried setting my umask to 000 but was unable to get my builds to fail the same way as the CircleCI builds.

Dependency on tools.git/core is for a detached commit; this breaks builds

https://github.com/efficientgo/e2e/blob/main/go.mod#L5-L6 says:

require (
	github.com/efficientgo/tools/core v0.0.0-20210129205121-421d0828c9a6

efficientgo/tools@421d0828c9a6 is a commit does not belong to any branch on this repository, and may belong to a fork outside of the repository and it seems I'm unable to build https://github.com/observatorium/obsctl because of this:

go: github.com/efficientgo/[email protected] requires
        github.com/efficientgo/tools/[email protected]: invalid version: unknown revision 421d0828c9a6
make: *** [Makefile:54: deps] Błąd 1

Ref: thanos-io/thanos#4806

Object does not exist error without pre pulling.

Sometimes I was getting weird errors of objects does not exist for new docker images. What always worked was:

I had to run e2e with WithVerbose option
copy docker run ... command for problematic image.
Run it manually locally

After that all runs 100% works.

Leaving here as a known issue to debug (: I suspect some permissions on my machines? 🤔 Let's see if other can repro!

Consider adding HTTPS readiness probe

On occasions, I'm using the framework to run services which are running only on HTTPS port (thus HTTP probe won't work). In such cases I tend to do a simple command readiness check by using curl --insecure ... https://<readiness-endpoint> or similar command. However, this has an overhead, since I have to 1) have a utility capable of probing available inside the container; 2) need to craft my own command with arguments each time.

It could be beneficial to have a HTTPS readiness probe, on similar principle (e.g. it could skip verifying TLS, which should be fine for purely testing purposes).

`monitoring` package is not usable on MacOS

See the report in thanos-io/thanos#4642.

The monitoring package uses github.com/containerd/cgroups, which seem to include code usable only on Linux platforms. We should find a way to make code from monitoring run on MacOS as well.

Minio is not ready even after `StartAndWaitReady` completes

Issue description

Trying to start Minio on the latest version of main, the server is not ready to handle requests, despite StartAndWaitReady being ran successfully already. Any immediate requests afterwards result in error response Server not initialized, please try again.

I suspect this could be an issue with the readiness probe upstream, since when setting up the same scenario with code version from before Minio image update in #4, everything is working correctly. However, I haven't confirmed the exact cause yet.

Minimal setup to reproduce

Run this test:

import (
	"context"
	"io/ioutil"
	"testing"

	"github.com/efficientgo/e2e"
        e2edb "github.com/efficientgo/e2e/db"
	"github.com/efficientgo/tools/core/pkg/testutil"
	"github.com/minio/minio-go/v7"
	"github.com/minio/minio-go/v7/pkg/credentials"
)

func TestMinio(t *testing.T) {
	e, err := e2e.NewDockerEnvironment("minio_test", e2e.WithVerbose())
	testutil.Ok(t, err)
	t.Cleanup(e.Close)

	const bucket = "minoiotest"
	m := e2edb.NewMinio(e, "minio", bucket)
	testutil.Ok(t, e2e.StartAndWaitReady(m))

	mc, err := minio.New(m.Endpoint("http"), &minio.Options{
		Creds: credentials.NewStaticV4(e2edb.MinioAccessKey, e2edb.MinioSecretKey, ""),
	})
	testutil.Ok(t, err)
	testutil.Ok(t, ioutil.WriteFile("test.txt", []byte("just a test"), 0755))

	_, err = mc.FPutObject(context.Background(), bucket, "obj", "./test.txt", minio.PutObjectOptions{})
	testutil.Ok(t, err)
}

Remove `RunOnce`

Hm, so I create RunOnce API but actually I forgot I managed to solve my use case without this in https://github.com/thanos-io/thanos/blob/main/test/e2e/compatibility_test.go#L62

It's as easy as creating noop container and doing execs...

// Start noop promql-compliance-tester. See https://github.com/prometheus/compliance/tree/main/promql on how to build local docker image.
	compliance := e.Runnable("promql-compliance-tester").Init(e2e.StartOptions{
		Image:   "promql-compliance-tester:latest",
		Command: e2e.NewCommandWithoutEntrypoint("tail", "-f", "/dev/null"),
	})
	testutil.Ok(t, e2e.StartAndWaitReady(compliance))
// ...
		stdout, stderr, err := compliance.Exec(e2e.NewCommand("/promql-compliance-tester", "-config-file", filepath.Join(compliance.InternalDir(), "receive.yaml")))
		t.Log(stdout, stderr)
		testutil.Ok(t, err)
	})

I think we should kill RunOnce API to simplify all - and put the above into examples? 🤔

cc @saswatamcode @philipgough @matej-g ?

Add benchmark example.

https://github.com/efficientgo/e2e/pull/49/files#r970496053