GithubHelp home page GithubHelp logo

kubewarden / policy-server Goto Github PK

View Code? Open in Web Editor NEW
132.0 9.0 16.0 27.58 MB

Webhook server that evaluates WebAssembly policies to validate Kubernetes requests

Home Page: https://kubewarden.io

License: Apache License 2.0

Rust 97.92% Dockerfile 0.66% Makefile 1.42%
rust kubernetes webassembly policy kubernetes-webhook kubernetes-security policy-as-code hacktoberfest

policy-server's Introduction

Artifact HUB CII Best Practices FOSSA Status

Note well: don't forget to checkout Kubewarden's documentation for more information

policy-server

policy-server is a Kubernetes dynamic admission controller that uses Kubewarden Policies to validate admission requests.

Kubewarden Policies are simple WebAssembly modules.

Deployment

We recommend to rely on the kubewarden-controller and the Kubernetes Custom Resources provided by it to deploy the Kubewarden stack.

Configuring policies

A single instance of policy-server can load multiple Kubewarden policies. The list of policies to load, how to expose them and their runtime settings are handled through a policies file.

By default policy-server will load the policies.yml file, unless the user provides a different value via the --policies flag.

This is an example of the policies file:

psp-apparmor:
  url: registry://ghcr.io/kubewarden/policies/psp-apparmor:v0.1.3
psp-capabilities:
  url: registry://ghcr.io/kubewarden/policies/psp-capabilities:v0.1.3
namespace_simple:
  url: file:///tmp/namespace-validate-policy.wasm
  settings:
    valid_namespace: kubewarden-approved

The YAML file contains a dictionary with strings as keys, and policy objects as values.

The key that identifies a policy is used by policy-server to expose the policy through its web interface. Policies are exposed under `/validate/.

For example, given the configuration file from above, the following API endpoint would be created:

  • /validate/psp-apparmor: this exposes the psp-apparmor:v0.1.3 policy. The Wasm module is downloaded from the OCI registry of GitHub.
  • /validate/psp-capabilities: this exposes the psp-capabilities:v0.1.3 policy. The Wasm module is downloaded from the OCI registry of GitHub.
  • /validate/namespace_simple: this exposes the namespace-validate-policy policy. The Wasm module is loaded from a local file located under /tmp/namespace-validate-policy.wasm.

It's common for policies to allow users to tune their behaviour via ad-hoc settings. These customization parameters are provided via the settings dictionary.

For example, given the configuration file from above, the namespace_simple policy will be invoked with the valid_namespace parameter set to kubewarden-approved.

Note well: it's possible to expose the same policy multiple times, each time with a different set of parameters.

The Wasm file providing the Kubewarden Policy can be either loaded from the local filesystem or it can be fetched from a remote location. The behaviour depends on the URL format provided by the user:

  • file:///some/local/program.wasm: load the policy from the local filesystem
  • https://some-host.com/some/remote/program.wasm: download the policy from the remote http(s) server
  • registry://localhost:5000/project/artifact:some-version download the policy from a OCI registry. The policy must have been pushed as an OCI artifact

Logging and distributed tracing

The verbosity of policy-server can be configured via the --log-level flag. The default log level used is info, but trace, debug, warn and error levels are available too.

Policy server can produce logs events using different formats. The --log-fmt flag is used to choose the format to be used.

Standard output

By default, log messages are printed on the standard output using the text format. Logs can be printed as JSON objects using the json format type.

Open Telemetry Collector

The open Telemetry project provides a collector component that can be used to receive, process and export telemetry data in a vendor agnostic way.

Policy server can send trace events to the Open Telemetry Collector using the --log-fmt otlp flag.

Current limitations:

  • Traces can be sent to the collector only via grpc. The HTTP transport layer is not supported.
  • The Open Telemetry Collector must be listening on localhost. When deployed on Kubernetes, policy-server must have the Open Telemetry Collector running as a sidecar.
  • Policy server doesn't expose any configuration setting for Open Telemetry (e.g.: endpoint URL, encryption, authentication,...). All of the tuning has to be done on the collector process that runs as a sidecar.

More details about OpenTelemetry and tracing can be found inside of our official docs.

Building

You can use the container image we maintain inside of our GitHub Container Registry.

Alternatively, the policy-server binary can be built in this way:

$ make build

Software bill of materials

Policy server has its software bill of materials (SBOM) published every release. It follows the SPDX version 2.2 format and it can be found together with the signature and certificate used to signed it in the release assets

Security

The Kubewarden team is security conscious. You can find our threat model assessment and responsible disclosure approach in our Kubewarden docs.

policy-server's People

Contributors

cmurphy avatar dependabot[bot] avatar ereslibre avatar fabriziosestito avatar flavio avatar jvanz avatar kravciak avatar olblak avatar raulcabello avatar renovate-bot avatar renovate[bot] avatar viccuad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

policy-server's Issues

Provide a way to print debug statements from the policies

As a policy author I want to debug why my policy is not working as expected. I would like to introduce some logging statements inside of the source code of my policy and have these messages:

  • printed to stdout when I run the policy via policy-testdrive
  • be part of the logs produced by policy-server

We could rely on the the __console_log function provided by waPC guest libraries, but I fear we would lose the context of the policy (pretty relevant with policy-server).

Maybe we should instead implement a more flexible logging (also with levels) via a custom host callback.

WASI and waPC guest coexistence

Consider if it makes sense for the policy-server to detect whether a guest module is WASI or waPC and load one or the other. This has implications on our side at the policy-server, but it virtually enables any kind of language that is able to use either WASI or waPC.

Allow to run policies with different runtimes

The policy-server has to be extended, so each policy defines the execution model it should choose. There are three options:

  • kubewarden-wapc
  • opa
  • opa-gatekeeper

Based on the runtime chosen by the policy, this should be forwarded to the policy-evaluator.

Use MsgPack instead of Json to move data back and forth between host and Wasm

Currently the data exchanged between the policy-server and the wasm "world" is encoded using json. The waPC upstream project instead decided to use MsgPack as an encoding format. Their choice doesn't force us to change our approach, however it guarantees that all the languages supported by waPC have also libraries to deal with MsgPack.

Why should we move to MsgPack?

  • MsgPack is an optimized (in terms of size) version of JSON encoding. That would reduce the amount of information sent back and forth between policy-server and Wasm policies. That will hopefully lead to faster evaluation times.

  • Support more languages. TinyGO for example, only has limited json support.

Refactoring of the main function

The main function is getting a bit crowded, also because of all the code that deals with the flags. We should probably split that up to a dedicated function and maybe even to a dedicated file.

Allow policies to pull information from the Kubernetes cluster

Allow guest policies to pull information from the Kubernetes cluster. This information can be a list of well-known type of resources and could be cached by the policy server, served from the cache to the guest policy. This allows a guest policy to take more complex decisions, based on the current status of the cluster.

For example, if a resource is namespaced, we can add a way to retrieve the namespace it belongs to, so the policy can read the annotations and labels.

In general, we could allow the policy to read arbitrary information (a fixed list of well known resources).

Allow setting plain (non-tls) and insecure (not certificate checking) per policy host

It is now possible to configure a number of policies reading the policies.yml file.

There are several options where a policy module can come from nowadays:

  • file: not interesting for this case.
  • http: not interesting for this case.
  • https
  • registry

A couple of scenarios:

  1. The https case by default will check that the presented server certificate is valid, but it should be possible to override this check for some hosts if the user desires so.

  2. The registry could potentially point to either a non-tls or a tls registry, since the schema offers no differentiation as http/https do.

Adding a sources.yml file that can be provided as an argument to the policy-server is desirable. The sources.yml file will be composed of a top level map with two optional entries:

  • insecure_sources: is an array of host:port similar to the Docker's insecure registries behavior. In this case, the policy-server will first try to download the artifact using https with no certificate validation, and if that connection fails, it will use plain http.

  • sources_authorities: is a map of host:port as keys, and an object containing ca-path that points to the pem encoded CA certificate or chain (if subordinate).

Example:

insecure_sources: ["my-registry.local.lan", "other-registry.local.lan:5001"]
sources_authorities:
  "self-signed.svc.lan:5001":
    ca-path: "/foo/bar.pem"
  other-self-signed.svc.lan:
    ca-path: "/foo/bar.pem"

When using the registry:// scheme as the origin for a Wasm module, if the host or host:port is not present in either insecure_sources or sources_authorities, then the connection will be always TLS with certificate verification using the system CA certificates -- if this connection fails, the connection won't be retried over non-tls.

Error downloading module: "host is not an insecure source"

I tried to deploy the scaffold project on GCR/GKE, and ran into this error:

May 08 17:43:05.828 ERROR policy_server: error while fetching policy name-check from registry://us.gcr.io/dlorenc-vmtest2/wasm: could not download Wasm module: error decoding response body: missing field `detail` at line 1 column 137; host is not an insecure source

The registry itself is public and does not require auth to pull. It looks like from reading the code I might need to set something in a sources.yml, but I'm not sure what or where to do that.

Allow policy authors to write policies that perform image verification using sigstore

sigstore is a new standard that allows the signature and verification of container images and OCI artifacts.

It would be great to allow policy authors to write policies that perform verification of container images using sigstore.

There's however one caveat: the sigstore code requires some data to be fetched from an OCI registry. Currently WASM code cannot make network calls.

To solve this problem, kubewarden's policy-evaluator should expose a waPC function that the WASM guest can invoke. This function would run on the host, and it would take care of making the network requests and performing the verification.

RUSTSEC-2021-0067: Memory access due to code generation flaw in Cranelift module

Memory access due to code generation flaw in Cranelift module

Details
Package cranelift-codegen
Version 0.66.0
URL GHSA-hpqh-2wqx-7qp5
Date 2021-05-21
Patched versions >=0.73.1

There is a bug in 0.73.0 of the Cranelift x64 backend that can create a
scenario that could result in a potential sandbox escape in a WebAssembly
module. Users of versions 0.73.0 of Cranelift should upgrade to either 0.73.1
or 0.74 to remediate this vulnerability. Users of Cranelift prior to 0.73.0
should update to 0.73.1 or 0.74 if they were not using the old default backend.

More details can be found in the GitHub Security Advisory at:

<GHSA-hpqh-2wqx-7qp5>

See advisory page for additional details.

Allow policies to produce log messages

This doesn't strictly belong to policy-server, but it's obviously related.

We should allow policies to produce log messages when being executed. These log messages should be shown inside of policy-server logs.

Acceptance criteria

  • Policy author have logging API that allow them to produce log messages with different level (eg: debug, info, warning, error)
  • The policy SDKs are extended to provide easy access to "log creation"
  • policy-server shows these logs

Linked cards

Research which prometheus metrics should be exported

We want operators to be able to understand how kubewarden is behaving. To achieve this goal, we will instruct the PolicyServer to export a series of prometheus metrics.

This card is about researching what kind of metrics we will have to export.

Acceptance criteria

  • Have a list of metrics that have to be exported
  • Are prometheus metrics enough, should we also write some data into the actual ClusterAdmissionPolicy resources?

RUSTSEC-2021-0073: Conversion from `prost_types::Timestamp` to `SystemTime` can cause an overflow and panic

Conversion from prost_types::Timestamp to SystemTime can cause an overflow and panic

Details
Package prost-types
Version 0.7.0
URL tokio-rs/prost#438
Date 2021-07-08
Patched versions >=0.8.0

Affected versions of this crate contained a bug in which untrusted input could cause an overflow and panic when converting a Timestamp to SystemTime.

It is recommended to upgrade to prost-types v0.8 and switch the usage of From&lt;Timestamp&gt; for SystemTime to TryFrom&lt;Timestamp&gt; for SystemTime.

See #438 for more information.

See advisory page for additional details.

Allow verification of policies using sigstore

Kubewarden policies are WebAssembly modules stored inside of OCI registries. Users can reference them by shasum if wanted, but it would way better to provide a stronger way to verify the policies before using them.

sigstore is a new standard that allows the signature and verification of container images and OCI artifacts.

We should leverage sigstore to provide a way to verify policies before their execution.

Use case

As an administrator,
I want all the Kubewarden policies loaded in production to be signed with a certain key,
so that unverified policies are not loaded.

How that should work

  • Policies are signed using cosign or some equivalent tool.
  • Policy Server has a configuration flag that enforces all the policies mentioned inside of its configuration file to be signed
  • PolicyServer reads a list of keys and annotations from a verification.yaml file. This file will be mounted as a Secret for PolicyServer to use it. Example:
verification-keys:
  key-name-irrelevant: |
        -----BEGIN PUBLIC KEY-----
        MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEX0HFTtCfTtPmkx5p1RbtwDE1EVzu
        wjQs1cCRKb5Pz/yUspkQsN3FO4iyWodCy5j3o0CdIJD/1gvq98pf4IG9tA==
        -----END PUBLIC KEY-----
verification-annotations: # optional
  env: prod
  • Policy Server exits with an error if one or more policies fail the verification

Note well: this is a rough idea about how this feature should work. We definitely need to spend more time discussing how that should work. Feedback is welcome!

Implement pull WASM modules from OCI registry

The cli flag to pull from registry:// is currently not implemented. We have to write a new Struct implementing the fetcher trait.

Pulling from a OCI registry can be done using this crate, this is the same crate used by the krustlet project.

[Epic] Add metrics endpoint

Adding metrics can help to improve observability.

For example:

  • Number of policies are loaded into the policy server
  • Mean response time of the policies, generally speaking but also at a policy level (eg: the privileged-pods policy has a mean response time of XXX)
  • Number of failures (globally, per policy)
  • Number of rejections (globally, per policy)
  • many more...

Why this is needed

Users can leverage tools like Prometheus to scrape these metrics and then:

  • Plot statistics about the policy server (eg: memory/cpu used)
  • Set alerts when something bad happens (eg: no instances running, all instances slowed down)

This can also help users to better define the deployment details of the server (number of replicas, resource limits,...)

Related issues

This is just an epic, the work of implementing that is done inside of these epics:

Do not download policies that have already been downloaded

Right now policy-server will always download all the policies defined inside of the policies.yml file, even if they have already been downloaded.

This is sub-optimal, changing a policy setting requires to restart the policy-server, which then causes all the policies to be downloaded again.

Acceptance criteria

  • policies.yml file is extended to allow something similar to Kubernetes' imagePullOptions
  • policy-server follows the policy pullOption and downloads the wasm file only when needed

Fix known vulnerabilities

There are several known vulnerabilities we can get rid of as of today.

Binary

~/projects/kubewarden/policy-server(a0fb44c) ยป cargo audit
    Fetching advisory database from `https://github.com/RustSec/advisory-db.git`
      Loaded 323 security advisories (from /home/ereslibre/.cargo/advisory-db)
    Updating crates.io index
    Scanning Cargo.lock for vulnerabilities (372 crate dependencies)
Crate:         prost-types
Version:       0.7.0
Title:         Conversion from `prost_types::Timestamp` to `SystemTime` can cause an overflow and panic
Date:          2021-07-08
ID:            RUSTSEC-2021-0073
URL:           https://rustsec.org/advisories/RUSTSEC-2021-0073
Solution:      Upgrade to >=0.8.0
Dependency tree:
prost-types 0.7.0
โ””โ”€โ”€ prost-build 0.7.0
    โ””โ”€โ”€ tonic-build 0.4.2
        โ””โ”€โ”€ opentelemetry-otlp 0.7.0
            โ””โ”€โ”€ policy-server 0.1.7

Crate:         crossbeam-deque
Version:       0.8.0
Warning:       yanked
Dependency tree:
crossbeam-deque 0.8.0
โ”œโ”€โ”€ rayon-core 1.9.1
โ”‚   โ””โ”€โ”€ rayon 1.5.1
โ”‚       โ””โ”€โ”€ wasmtime-jit 0.27.0
โ”‚           โ””โ”€โ”€ wasmtime 0.27.0
โ”‚               โ”œโ”€โ”€ wasmtime-wiggle 0.27.0
โ”‚               โ”‚   โ””โ”€โ”€ wasmtime-wasi 0.27.0
โ”‚               โ”‚       โ””โ”€โ”€ wasmtime-provider 0.0.3
โ”‚               โ”‚           โ””โ”€โ”€ policy-evaluator 0.1.19
โ”‚               โ”‚               โ””โ”€โ”€ policy-server 0.1.7
โ”‚               โ”œโ”€โ”€ wasmtime-wasi 0.27.0
โ”‚               โ””โ”€โ”€ wasmtime-provider 0.0.3
โ””โ”€โ”€ rayon 1.5.1

error: 1 vulnerability found!
warning: 1 allowed warning found

Image

~/projects/kubewarden/policy-server(a0fb44c) ยป trivy i ghcr.io/kubewarden/policy-server:v0.1.7
2021-08-19T12:12:29.974+0200	INFO	Detected OS: opensuse.leap
2021-08-19T12:12:29.974+0200	INFO	Detecting SUSE vulnerabilities...
2021-08-19T12:12:29.975+0200	INFO	Number of language-specific files: 0

ghcr.io/kubewarden/policy-server:v0.1.7 (opensuse.leap 15.3)
============================================================
Total: 7 (UNKNOWN: 0, LOW: 0, MEDIUM: 5, HIGH: 2, CRITICAL: 0)

+--------------+-------------------------+----------+-------------------+---------------+-----------------------------+
|   LIBRARY    |    VULNERABILITY ID     | SEVERITY | INSTALLED VERSION | FIXED VERSION |            TITLE            |
+--------------+-------------------------+----------+-------------------+---------------+-----------------------------+
| cpio         | openSUSE-SU-2021:2689-1 | HIGH     | 2.12-3.3.1        | 2.12-3.6.1    | Security update for cpio    |
+--------------+-------------------------+----------+-------------------+---------------+-----------------------------+
| libcurl4     | openSUSE-SU-2021:2439-1 | MEDIUM   | 7.66.0-4.17.1     | 7.66.0-4.22.1 | Security update for curl    |
+--------------+-------------------------+          +-------------------+---------------+-----------------------------+
| liblua5_3-5  | openSUSE-SU-2021:2196-1 |          | 5.3.4-3.3.2       | 5.3.6-3.6.1   | Security update for lua53   |
+--------------+-------------------------+----------+-------------------+---------------+-----------------------------+
| libsqlite3-0 | openSUSE-SU-2021:2320-1 | HIGH     | 3.28.0-3.9.2      | 3.36.0-3.12.1 | Security update for sqlite3 |
+--------------+-------------------------+----------+-------------------+---------------+-----------------------------+
| libsystemd0  | openSUSE-SU-2021:2410-1 | MEDIUM   | 246.13-5.1        | 246.13-7.8.1  | Security update for systemd |
+--------------+                         +          +                   +               +                             +
| libudev1     |                         |          |                   |               |                             |
+--------------+-------------------------+          +-------------------+---------------+-----------------------------+
| rpm          | openSUSE-SU-2021:2682-1 |          | 4.14.1-29.46      | 4.14.3-37.2   | Security update for rpm     |
+--------------+-------------------------+----------+-------------------+---------------+-----------------------------+

We can get rid of some of the image vulnerabilities, but not all, since the current latest container image has already known vulnerabilities:

~/projects/kubewarden/policy-server(a0fb44c) ยป trivy i registry.opensuse.org/opensuse/leap:15.3
2021-08-19T12:14:10.335+0200	INFO	Detected OS: opensuse.leap
2021-08-19T12:14:10.335+0200	INFO	Detecting SUSE vulnerabilities...
2021-08-19T12:14:10.335+0200	INFO	Number of language-specific files: 0

registry.opensuse.org/opensuse/leap:15.3 (opensuse.leap 15.3)
=============================================================
Total: 2 (UNKNOWN: 0, LOW: 0, MEDIUM: 1, HIGH: 1, CRITICAL: 0)

+---------+-------------------------+----------+-------------------+---------------+--------------------------+
| LIBRARY |    VULNERABILITY ID     | SEVERITY | INSTALLED VERSION | FIXED VERSION |          TITLE           |
+---------+-------------------------+----------+-------------------+---------------+--------------------------+
| cpio    | openSUSE-SU-2021:2689-1 | HIGH     | 2.12-3.3.1        | 2.12-3.6.1    | Security update for cpio |
+---------+-------------------------+----------+-------------------+---------------+--------------------------+
| rpm     | openSUSE-SU-2021:2682-1 | MEDIUM   | 4.14.1-29.46      | 4.14.3-37.2   | Security update for rpm  |
+---------+-------------------------+----------+-------------------+---------------+--------------------------+

Perform policy settings validation

Policies can expose a validate_settings hook that can be used to validate the policy settings provided by the user. We should leverage that to validate policies.

Acceptance criteria

  • Attempt to load a policy with an invalid setting causes the policy-server to exit with an error
  • The check is done at boot time

Support mutation policies

Mutation policies allow incoming requests to be altered to fit user requirements.

This is for example needed by thePSP capabilities policy:

AllowedCapabilities - Provides a list of capabilities that are allowed to be added to a container. The default set of capabilities are implicitly allowed. The empty set means that no additional capabilities may be added beyond the default set. * can be used to allow all capabilities.

RequiredDropCapabilities - The capabilities which must be dropped from containers. These capabilities are removed from the default set, and must not be added. Capabilities listed in RequiredDropCapabilities must not be included in AllowedCapabilities or DefaultAddCapabilities.

DefaultAddCapabilities - The capabilities which are added to containers by default, in addition to the runtime defaults. See the Docker documentation for the default list of capabilities when using the Docker runtime.
SELinux

A mutating policy is needed to implement the DefaultAddCapabilities behaviour

Handle OPA/Gatekeeper policies

This card is part of kubewarden/policy-evaluator#14.

Policy server should be able to load and evaluate Wasm modules that have been originated by opa build and have been then annotated via kwctl (see kubewarden/kwctl#55).

Admission criteria

  • policy server will load Rego-based policies that have been annotated by kwctl
  • A Rego-based policy that has NOT been annotated by kwctl will not be loaded, and will cause the server to exit with an error
  • policy server will prepare the right values expected by OPA and Gatekeeper policies

Server: perform tls termination

Right now the server started by policy-server listens only over HTTP. We should make it easy to start a HTTPS server.

The cli app has the flags required to provide the key and cert to be used by the http server, but there's no code handling that.

We are using hyper to implement our web server, the tls termination must be done using hyper-tls. This is already used by the WASM HTTPS fetcher.

Some extra information: the WASM HTTPS fetcher is using hyper-tls instead of hyer-rustls because the latter one is not flexible enough for its use case. The hyper-rustls crate doesn't allow to create a HttpsConnector that accepts certificates signed by unknown CAs, plus it doesn't have a way to add custom CAs to verify them.
Now, for the https server these features are not relevant, but we don't want to have two tls crates as dependency.

Enrich policies with metadata

What

We want to enrich each policy by adding some metadata to it. This information will be provided by the policy author.

Why

Adding metatadata to the policy will allow us to do things as:

  • Version policy files. We've already changed the communication protocol between the host (e.g. the policy server) and the guest (the actual policy). By versioning the policy we can for example prevent too old policies from being loaded by the policy-server/kwctl
  • Provide a better UX for the end users. Right now the end users have to provide quite some information when creating a ClusterAdmissionPolicy resource. If the user enters the wrong information (e.g. he states a policy designed to process Pod resources can handle also Deployment resources), the policy evaluation can be totally broken/provide wrong results. The policy metadata will provide these information, and kwctl can then be used to scaffold correct definitions of the ClusterAdmissionPolicy resource

How

The metadata of a single policy would look like that:

{
  "policyVersion": "v1alpha1",
  "rules": [
    {
      "apiGroups":[""],
      "apiVersions":["v1"],
      "resources":["pods"],
      "operations":["CREATE"]
    }
  ]
}

The rules section is a has all the required fields of admission registration config.

Each policy would have to expose a metadata guest call, that will return this object. This metadata will also be duplicated inside of the custom section of the resulting Wasm file.

We want to have the metadata into the custom section of the Wasm file too because this allows to inspect a Wasm module without evaluating that. The evaluation process of a Wasm module can take significant time, depending on how the originating compiler built the Wasm file.

Linked cards

Policy server:

kwctl

sdks

  • Rust
  • Go
  • Swift
  • AssemblyScript: update the existing policy

Workflows

Docs

Implement structured logging

Right now the code is filled with println! statements to perform logging. This is not working well, especially because of async.

Given we're using the tokio async ecosystem (hyper + tokio itself), we should probably use tokio tracing for that.

Distributed tracing: talk with jaeger via gRPC

By default the opentelemetry-jaeger library talks with the Jaeger collector using UDP. UPD packets have a fixed size, if a trace is bigger than that it will simply be dropped.

All the current trace events generated by policy-server do not fit into a single UDP packet. As a result of that, no trace is going to be received by the Jaeger collector.

We should instead communicate with Jaeger over gRPC, which can fragment trace events over multiple packets.

Abstract the policy downloader to a subcrate

In the spirit of https://github.com/chimera-kube/policy-server/tree/aac1a0ee3382ebb32a15baa090065702693ab5d9/crates, add a new crate that contains the logic of downloading a Wasm module from an OCI artifact, regular HTTP server or from the local filesystem, currently inside the policy-server main crate: https://github.com/chimera-kube/policy-server/tree/aac1a0ee3382ebb32a15baa090065702693ab5d9/src/wasm_fetcher.

This would allow both the policy-server and the policy-testdrive (and optionally other components) consume this crate and allow to pull OCI Wasm artifacts. As an immediate result, users of policy-testdrive don't need to pre-fetch the Wasm module locally: they could just provide a http(s)://, registry:// or file:// scheme to an existing Wasm artifact.

Provide /healthz endpoint

Provide a endpoint that could be used by Kubernetes to figure out if the application is healthy.

Why this is needed

Kubernetes can monitor the health status of the policy-server instances by querying this /healthz endpoint. When a policy-server instance is unhealthy it (Kubernetes) will take care of starting a new Pod and killing the unhealthy one.

Expose policy metrics via a prometheus endpoint

The goal is to have each PolicyServer instance expose metrics about the policies currently loaded.

The metrics should be exported as a prometheus endpoint.

TODO:

  • Define the metrics to be exposed
  • Understand the implications of exposing metrics via OpenTelemetry vs using a prometheus library

Improve CHANGELOG.md generation

The current CHANGELOG.md looks like the following:

<a name="unreleased"></a>
## [Unreleased]


<a name="v0.1.3"></a>
## [v0.1.3] - 2021-04-14
### Features
- add changelog generation tooling

### Pull Requests
- Merge pull request [#52](https://github.com/kubewarden/policy-server/issues/52) from ereslibre/changelog-generation
- Merge pull request [#51](https://github.com/kubewarden/policy-server/issues/51) from ereslibre/remove-threaded-panic


<a name="v0.1.2"></a>
## [v0.1.2] - 2021-04-07
### Pull Requests
- Merge pull request [#48](https://github.com/kubewarden/policy-server/issues/48) from flavio/enforce_settings_validation
- Merge pull request [#49](https://github.com/kubewarden/policy-server/issues/49) from flavio/compress-policy-testdrive-artifact


<a name="v0.1.1"></a>
## [v0.1.1] - 2021-04-06

<a name="v0.1.0"></a>
## v0.1.0 - 2021-04-02
### Pull Requests
- Merge pull request [#45](https://github.com/kubewarden/policy-server/issues/45) from kubewarden/remove-pat
- Merge pull request [#44](https://github.com/kubewarden/policy-server/issues/44) from flavio/update-sdk-dep
- Merge pull request [#43](https://github.com/kubewarden/policy-server/issues/43) from flavio/rename
- Merge pull request [#39](https://github.com/kubewarden/policy-server/issues/39) from ereslibre/context-aware
- Merge pull request [#38](https://github.com/kubewarden/policy-server/issues/38) from flavio/logging
- Merge pull request [#35](https://github.com/kubewarden/policy-server/issues/35) from flavio/settings-validation
- Merge pull request [#36](https://github.com/kubewarden/policy-server/issues/36) from flavio/testdrive-release
- Merge pull request [#34](https://github.com/kubewarden/policy-server/issues/34) from flavio/mutating-policies
- Merge pull request [#26](https://github.com/kubewarden/policy-server/issues/26) from ereslibre/sources-yaml
- Merge pull request [#25](https://github.com/kubewarden/policy-server/issues/25) from flavio/policies-download-dir
- Merge pull request [#24](https://github.com/kubewarden/policy-server/issues/24) from flavio/unwrap-cleanup
- Merge pull request [#16](https://github.com/kubewarden/policy-server/issues/16) from ereslibre/registry-authentication
- Merge pull request [#22](https://github.com/kubewarden/policy-server/issues/22) from flavio/wait-for-workers-to-be-ready
- Merge pull request [#17](https://github.com/kubewarden/policy-server/issues/17) from chimera-kube/policies-settings
- Merge pull request [#15](https://github.com/kubewarden/policy-server/issues/15) from cmurphy/tls
- Merge pull request [#18](https://github.com/kubewarden/policy-server/issues/18) from cmurphy/build-image
- Merge pull request [#11](https://github.com/kubewarden/policy-server/issues/11) from ereslibre/oci-artifacts
- Merge pull request [#10](https://github.com/kubewarden/policy-server/issues/10) from cmurphy/error-handling
- Merge pull request [#9](https://github.com/kubewarden/policy-server/issues/9) from cmurphy/fix-uid


[Unreleased]: https://github.com/kubewarden/policy-server/compare/v0.1.3...HEAD
[v0.1.3]: https://github.com/kubewarden/policy-server/compare/v0.1.2...v0.1.3
[v0.1.2]: https://github.com/kubewarden/policy-server/compare/v0.1.1...v0.1.2
[v0.1.1]: https://github.com/kubewarden/policy-server/compare/v0.1.0...v0.1.1

Improve the template and the generated contents in the following ways:

  • Do not include an unreleased section
  • Do not include "Merge pull request" as they are: include the title of the PR, or something that has more information than the current content

Partially based on feedback provided on kubewarden/kubewarden-controller#29

Distributed tracing: enrich traces with extra tags

Some extra tags should be added to our trace events. This would make it easier to find a trace inside of tools like Jaeger (they search based on tags).

It should be enough to add these tags only to the parent trace event named validation. This is generated inside of the src/api.rs

Looking at the fields of AdmissionReview, we have to expose as searchable tags things as:

  • kind: each attribute should be broken down into a dedicated tag
  • resource: each attribute should be broken down into a dedicated tag
  • subresource
  • name
  • namespace
  • operation

Also the trace should be enriched with the final outcome of the policy evaluation:

  • allowed: true or false
  • mutated: true or false

UPDATE: we uncovered some technical debt while reviewing the PR, adding a new checklist:

  • Define the AdmissionReview resource via a better Rust structure

Tip for the end developer

Each field must be declared ahead of time inside of the tracing macro. In the beginning it will be an empty field, it will be populated later on inside of the code.

As an example, look at how the requires_uid field is handled.

Declaration:

policy-server/src/api.rs

Lines 44 to 48 in 1f68c06

fields(
request_uid=tracing::field::Empty,
host=crate::cli::HOSTNAME.as_str(),
policy_id=policy_id.as_str(),
),

Setting the actual value:

policy-server/src/api.rs

Lines 77 to 78 in 1f68c06

// add request UID to the span context as one of its fields
Span::current().record("request_uid", &adm_rev.uid.as_str());

Add some way to version our policy wasm files

The policy format already changed once (when we from WASI to waPC). There are chances this will happen again in the future (see the #28 ).

When the policy format changes, we have to find a way to not load policies written with the old format into policy-server.

Acceptance criteria

  • It's possible to query the spec version of a Kubewarden policy
  • Extra: the policy-server refuses to load older policies and exits with an error at boot time

Better error handling

Right now the main function is full of .unwrap() calls. Every time something minor happens (like the user tries to pull a WASM module from a wrong HTTP endpoint), the policy-server panics and prints a message.

We should do better error handling:

  • Do not cause the main program to panic
  • Catch the error, print it to STDERR and then exit with 1

Tip: take a look at the rust book for cli apps for inspiration.

Read policy settings from configuration file

Read the list of policies to host from a configuration file.

The structure of the configuration file should look like that:

policies:
  - endpoint: "/toleration1"
    url: registry://ghcr.io/chimera-kube/toleration-policy:1.0.0
    settings:
      taint-key: dedicated
      taint-value: tenantA
      allowed_groups:
      - administrators
      - tenant-a-users
  - endpoint: "/toleration2"
    url: registry://ghcr.io/chimera-kube/toleration-policy:1.0.0
    settings:
      taint-key: dedicated
      taint-value: tenantB
      allowed_groups:
      - administrators
      - tenant-b-users
  - endpoint: "/toleration2"
    url: registry://ghcr.io/chimera-kube/pod-privileged:1.0.0
    settings:
      trusted_users:
      - flavio
      trusted_groups:
      - admins

This allows the same policy to be instantiated multiple times, under different endpoints, with different flags.

policy validation check: inspect wasm file

Policy settings are currently checked by invoking a waPC function exposed by the Wasm modules. However not all the policies might implement this function.

Right now the code detects the modules that do not implement this feature by looking at the error message raised by waPC host. Unfortunately there's no other way to do that, because the library doesn't offer a specific method for that.

Relying on message parsing is not reliable on the long term. This is why I'm filing this issue.

Possible solutions:

  • We inspect the Wasm binary and look for the exported function inside of it
  • We make a contribution upstream inside of waPC host, to enrich the errors type

Add testing and CI

There are not yet any tests or automated PR verification. We should add:

  • Automated linting/formatting
  • Unit tests
  • Functional and/or end-to-end tests
  • Performance benchmark tests: https://github.com/kubewarden/load-testing
  • Github Actions workflows to perform these tests for every pull request

RUSTSEC-2021-0013: Soundness issues in `raw-cpuid`

Soundness issues in raw-cpuid

Details
Package raw-cpuid
Version 7.0.4
URL rustsec/advisory-db#614
Date 2021-01-20
Patched versions >=9.0.0

Undefined behavior in as_string() methods

VendorInfo::as_string(), SoCVendorBrand::as_string(),
and ExtendedFunctionInfo::processor_brand_string() construct byte slices
using std::slice::from_raw_parts(), with data coming from
#[repr(Rust)] structs. This is always undefined behavior.

See gz/rust-cpuid#40.

This flaw has been fixed in v9.0.0, by making the relevant structs
#[repr(C)].

native_cpuid::cpuid_count() is unsound

native_cpuid::cpuid_count() exposes the unsafe __cpuid_count() intrinsic
from core::arch::x86 or core::arch::x86_64 as a safe function, and uses
it internally, without checking the
safety requirement:

> The CPU the program is currently running on supports the function being
> called.

CPUID is available in most, but not all, x86/x86_64 environments. The crate
compiles only on these architectures, so others are unaffected.

This issue is mitigated by the fact that affected programs are expected
to crash deterministically every time.

See gz/rust-cpuid#41.

The flaw has been fixed in v9.0.0, by intentionally breaking compilation
when targetting SGX or 32-bit x86 without SSE. This covers all affected CPUs.

See advisory page for additional details.

Allow to provide authentication information for OCI registries

Since #1 was implemented, it is possible to set the registry:// scheme in the --wasm-uri argument so the WASM module is pulled from an OCI registry.

At this time no authentication can be provided and all WASM modules are downloaded unauthenticated.

It is desired to support OCI registries that have authentication enabled, so that it's possible to provide authentication details depending on the registry that is being targeted.

The oci-distibution crate allows either for Anonymous access or basic auth. It is needed to figure out what is the best path forward in order to provide this feature.

  • How would this information be exposed to policy-server consumers?
  • Is oci-distribution looking forward to improve its UX/features in this regard? In this case, can changes be contributed to the oci-distribution crate (krustlet project) and consumed by policy-server?

Distributed tracing: investigate trace distribution

Right now policy-server can send trace events to the following collectors: jaeger and OpenTelemetry.

Talking directly with a Jaeger collector reduces the number of moving parts: there's just policy-server and a jaeger collector.

The OpenTelemetry collector, on the other hand, leads to more components being deployed. That's because OpenTelemetry is just a "translator" of traces/logs/metrics. policy-server send traces to the OpenTelemetry collector using the otlp format, then it's up to the collector to forward them to another collector.
The main advantage is that OpenTelemetry can forward trace events to different collectors, such as jaeger, zipkin, AWS X-Ray,...

This means that, by using OpenTelemetry, our users can integrate policy-server into whatever infrastructure they already have in place. All of that without us having to maintain the "send trace code" inside of policy-server.

Open questions

  • Should we leave around the code that sends trace events straight to Jaeger?
  • Should we support only sending logs to a collector that understands the otlp format (such as the OpenTelemetry collector)

More food for thoughts

OpenTelemetry can handle also metrics and logs. We want to policy-server to expose metrics. Should we do that by integrating with OpenTelemetry? The main advantage would be again the possibility to expose our metrics to Prometheus, Lightstep and other solutions.

The point is: if we decide to commit to OpenTelemetry to expose our metrics, then it would make sense to handle traces only via OpenTelemetry, hence we could drop the "direct Jaeger integration".

policy-testdrive: remove confusing panic message

Since the introduction of context-aware policies, the policy-testdrive can produce confusing message when there's no connection available to the k8s cluster defined inside of ~/.kube/config.

For example, this is the output produced on my machine when my minikube server is down:

policy-testdrive --policy module.wasm --request-file pod-req-no-specific-apparmor-profile.json 
thread '<unnamed>' panicked at 'could not initialize Kubernetes client: Error loading kubeconfig: Failed to infer config.. cluster env: (Error loading kubeconfig: Unable to load in cluster config, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined), kubeconfig: (Error loading kubeconfig: Unable to load current context: )', crates/policy-evaluator/src/cluster_context.rs:74:31
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Settings validation result: SettingsValidationResponse { valid: true, message: None }
Policy evaluation results:
ValidationResponse { uid: "", allowed: true, patch_type: None, patch: None, status: None }

Acceptance criteria

No panic message is printed, instead a nicer warning message should be shown. Something that tells:

  1. cannot reach the Kubernetes cluster defined inside of your configuration
  2. context-aware policies are not going to be working

Report loaded and updated policies in the log

When I update the pod-privileged policy (because of kubewarden/kubewarden.io#26), helm just reports

clusteradmissionpolicy.policies.kubewarden.io/privileged-pods configured

So I looked at the policy-server logs (using k9s) to get some more information but it only shows

Sep 17 08:56:33.432 INFO policy_server: policies download download_dir="/tmp/" policies_count=1 status="init"
pulling policy...
Sep 17 08:56:35.216 INFO policy_server: policies download status="done"
Sep 17 08:56:35.216 INFO policy_server: kubernetes poller bootstrap status="init"
Sep 17 08:56:35.216 INFO policy_server: kubernetes poller bootstrap status="done"
Sep 17 08:56:35.216 INFO policy_server: worker pool bootstrap status="init"
Sep 17 08:56:35.216 INFO policy_server::kube_poller: spawning cluster context refresh loop
Sep 17 08:56:35.229 INFO policy_server::worker_pool: spawning worker spawned=2 total=2
Sep 17 08:56:35.230 INFO policy_server::worker_pool: spawning worker spawned=1 total=2
Sep 17 08:56:35.328 INFO policy_server: worker pool bootstrap status="done"
Sep 17 08:56:35.333 INFO policy_server::server: started HTTPS server address="0.0.0.0:8443"

which doesn't tell me anything about

  • which policy was pulled
  • which version of this policy
  • was it running before (is it an update) or not (is it a new policy)
  • which other policies are running (in which version)

While the current log is helpful for developers, it doesn't contain information relevant for operators.

Add common attributes to policies

Many policies have to implement concepts like allowed_users and allowed_groups inside of their code. We should instead move these attributes, and their handling, inside of the Policy Server. By doing that we will DRY policies and save quite some time to policy authors.

The idea is to introduce these shared attributes that can be enabled on each policy:

  • allowed_users: list of users who are not affected by the policy
  • allowed_groups list of groups who are not affected by the policy
  • target_users: reduce the scope of the policy only to requests made by these users
  • target_groups: reduce the scope of the policy only to requests made by users who belong to these groups

Each one of these fields would be optional.

This is how the policies would behave, given the following scenarios:

  • Bothallowed_ attributes are not set: the policy will evaluate all requests, regardless of the author
  • Both target_ attributes are not set: the policy will evaluate all requests, regardless of the author
  • allowed_ and target_ are set at the same time: only requests originated from target_ are going to be evaluated by the policy. If a request is coming from an allowed_user or allowed_groups the request will be immediately approved.

Examples

Apply the policy to all the users of the clusters except for alice and all the users who belong to the administrators group:

privileged_pods:
  url: registry://ghcr.io/chimera-kube/policies/psp-apparmor:v0.1.0
  settings:
    allowed_profiles:
    - runtime/default
    - localhost/my-special-workload
  allowed_users:
  - alice
  allowed_groups:
  - administrators

Apply the policy only to the requests coming from the user bob and all the users who belong to the group tenant-a:

privileged_pods:
  url: registry://ghcr.io/chimera-kube/policies/psp-apparmor:v0.1.0
  settings:
    allowed_profiles:
    - runtime/default
    - localhost/my-special-workload
  target_users:
  - bob
  target_groups:
  - tenant-a

Apply the policy only to the requests made by users who belong to the mortals group. At the same time exempt the user bob from this policy:

privileged_pods:
  url: registry://ghcr.io/chimera-kube/policies/psp-apparmor:v0.1.0
  settings:
    allowed_profiles:
    - runtime/default
    - localhost/my-special-workload
  allowed_users:
  - bob
  target_groups:
  - mortals

Let's assume user bob belongs to the mortals group. With the configuration shown above the requests coming from user bob would not be processed by the policy.

Revisit context-aware deployment on Kubernetes

Right now, all tests with context-aware features have been done manually or reusing a local kubeconfig (that is usually an admin kubeconfig locally). In the real deployment, we have to take care of RBAC permissions, so when a user deploys the policy-server, it has permissions to list the current hardcoded resources it is listing right now:

  • Namespaces
  • Services
  • Ingresses

Of course, when we generalize this API types (and also allow CRD's to be listed and watched), we will have to easily allow the user grant permissions when deploying/redeploying the policy-server with the helm chart.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.