GithubHelp home page GithubHelp logo

zalando-incubator / kopf Goto Github PK

View Code? Open in Web Editor NEW
971.0 32.0 90.0 1.65 MB

A Python framework to write Kubernetes operators in just few lines of code.

Home Page: https://kopf.readthedocs.io

License: MIT License

Python 99.87% Shell 0.13%
kubernetes kubernetes-operator python python3 framework asyncio operator domain-driven-design

kopf's Introduction

This repository is suspended and not maintained.

It is kept in place for historic reference, so that all links remain valid, and the issues' & PRs' discussions are preserved for debugging & investigations.

Kopf's development currently happens here:

Please send new issues and pull requests there.


Kubernetes Operator Pythonic Framework (Kopf)

Build Status codecov Coverage Status Total alerts Language grade: Python

Kopf —Kubernetes Operator Pythonic Framework— is a framework and a library to make Kubernetes operators development easier, just in few lines of Python code.

The main goal is to bring the Domain-Driven Design to the infrastructure level, with Kubernetes being an orchestrator/database of the domain objects (custom resources), and the operators containing the domain logic (with no or minimal infrastructure logic).

Documentation

Features

  • A full-featured operator in just 2 files: Dockerfile + a Python module.
  • Implicit object's status updates, as returned from the Python functions.
  • Multiple creation/update/deletion handlers to track the object handling process.
  • Update handlers for the selected fields with automatic value diffs.
  • Dynamically generated sub-handlers using the same handling tracking feature.
  • Retries of the handlers in case of failures or exceptions.
  • Easy object hierarchy building with the labels/naming propagation.
  • Built-in events for the objects to reflect their state (as seen in kubectl describe).
  • Automatic logging/reporting of the handling process (as logs + events).
  • Handling of multiple CRDs in one process.
  • The development instance temporarily suppresses the deployed ones.

Examples

See examples for the examples of the typical use-cases.

The minimalistic operator can look like this:

import kopf

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn(spec, meta, status, **kwargs):
    print(f"And here we are! Creating: {spec}")

The keyword arguments available to the handlers:

  • body for the whole body of the handled objects.
  • spec as an alias for body['spec'].
  • meta as an alias for body['metadata'].
  • status as an alias for body['status'].
  • patch is a dict with the object changes to applied after the handler.
  • retry (int) is the sequential number of retry of this handler.
  • started (datetime.datetime) is the start time of the handler, in case of retries & errors.
  • runtime (datetime.timedelay) is the duration of the handler run, in case of retries & errors.
  • diff is a list of changes of the object (only for the update events).
  • old is the old state of the object or a field (only for the update events).
  • new is the new state of the object or a field (only for the update events).
  • logger is a per-object logger, with the messages prefixed with the object's namespace/name.
  • event is the raw event as received from the Kubernetes API.
  • cause is the processed cause of the handler as detected by the framework (create/update/delete).

**kwargs (or **_ to stop lint warnings) is required for the forward compatibility: the framework can add new keyword arguments in the future, and the existing handlers should accept them.

Usage

We assume that when the operator is executed in the cluster, it must be packaged into a docker image with CI/CD tool of your preference.

FROM python:3.7
ADD . /src
RUN pip install kopf
CMD kopf run /src/handlers.py

Where handlers.py is your Python script with the handlers (see examples/*/example.py for the examples).

See kopf run --help for others ways of attaching the handlers.

Contributing

Please read CONTRIBUTING.md for details on our process for submitting pull requests to us, and please ensure you follow the CODE_OF_CONDUCT.md.

To install the environment for the local development, read DEVELOPMENT.md.

Versioning

We use SemVer for versioning. For the versions available, see the releases on this repository.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Acknowledgments

kopf's People

Contributors

0xflotus avatar damc-dev avatar dbazhal avatar dlmiddlecote avatar gautamp8 avatar hjacobs avatar jc2k avatar nolar avatar parking52 avatar pawelkopka avatar perploug avatar prakashkl88 avatar pshchelo avatar s-soroosh avatar smileisak avatar thevennamaneni avatar trondhindenes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kopf's Issues

Test the behaviour in real cluster (minikube)

As a continuation of #13 (unit-test), add the functional tests with the real cluster, real custom resource, etc.

It is needed to verify that the whole thing actually works altogether against a real Kubernetes API (the unit-tests only test the internals, and against the assumed behaviour of Kubernetes (via mocks)).

Use Minikube in Travis CI — but, perhaps, move the minikube start/stop operators to the fixtures (module-scoped?), so that the cluster could be reset when needed. Read more:

Tests to implement:

  • Starting the operator (cluster-wide and namespaced).
    • Example's/Documentation's RBAC sufficient?
    • Resource watching works?
    • Peering watching works?
  • Custom resource operations.
    • Creation.
    • Update.
    • Deletion.
  • Peering with 2+ operators:
    • Same priority.
    • Different priorities.

Use it? Tell us.

We are happy to know that you like Kopf, and can apply it in your work.

We will be even more happy if you share with us how you use it: operators created, patterns invented, problems solved, so on. Repos? Posts? Demos? Talks? Presentations? — Everything counts.

In additional to the emotional reward for creating a useful tool, this will help us to decide on the feature roadmap, and to adjust Kopf to the real needs of real users.

For bugs, ideas, feature requests, or just questions, please open a new issue — so that other developers and users can find it later.

Tests automation

The project is now in the proof-of-concept stage, and lacks the tests (beside the examples, manually executed during the development).

It needs some good unit tests to freeze its current state, so that we could continue to add new features safely.

Separate the tests into the internal tests and the external promises of the interface.

Add the coverage measurements.


An estimation of topics to cover with the tests:

  • Library's public interfaces (what is exported via the main module).
  • #28 Object hierarchies and name/namespace/label parent-to-child propagation.
  • #29 Registries and kopf.on decorators.
  • #91 (+#82) Last-seen state manipulation.
  • #40 Diff calculation.
  • #34 Handler function invocation.
  • #63 Event-loop running, task spawning.
  • #63 Watching & queueing & per-object queues/workers.
  • #82 Handling: cause detection and proper reaction.
  • #90 (+#82) Handling: finalizers adding/removing.
  • #61 Handling: handler progress storage.
  • #35 Handling: lifecycles and handler selection.
  • Sub-handler invocation and context-vars setting.
  • Peering and neighbourhood awareness.
  • #39 CLI module loading/importing.
  • #39 CLI commands and options, including cluster authentication.
  • #71 Mocked K8s client or API calls.
  • #53 #54 Real-cluster execution and smoke tests.
  • #72 Coverage MOVED TO #99

Label the sub-templates in the container resources (e.g. pods in jobs)

Actual Behavior

When I create a job, and label it with some parent-referencing labels, I expect the labels to be also applied to the pod templates:

kopf.label(job, {'parent-name': name})

The pod template lacks the labels, only the parent job objects gets it.

One way to work around this is to explicitly label the job and its template:

kopf.label(job, {'parent-name': name})
kopf.label(job['spec']['template'], {'parent-name': name})

Expected Behavior

However, it would be better to do in one call:

# Explicitly:
kopf.label(job, {'parent-name': name}, nested=['spec.template'])

# Auto-guessing:
kopf.label(job, {'parent-name': name}, recursive=True)

Project status

Hi all,

First of all, thanks for kopf, it's refreshing to see Python can be considered as legitimate as Go in this arena.

I wondered however, what's the status of the project? I see a 0.11 version so I can assume it's still in progress, but I also see a few opened enhanchements that I feel will help the overall reliability of running kopf. So, what I mean is, what is the status of the project in general? Is there a roadmap for this year by any chance?

Knowing this could help making a decision towards using it now in production or not :)

Cheers,

Code linting on build

Code should be automatically linted on every push, as part of the building process.

The coding guidelines should be some defaults of pylint, flake8, or both. One exception:

  • Line length is 100 chars.

No linting scripts, no CLI options, just the standard CLI tool should already take all this into account — i.e. auto-detectable configs of these tools should be used (same as by the IDE).

Documentation

The project is now in the proof-of-concept stage, and lacks the documentation (beside the docstrings and examples).

It needs some normal hosted HTML docs (e.g. on ReadTheDocs).

  • Hosting configured.
  • Builds configured.

Specifically, some topics to not forget:

  • Typical patterns to solve typical problems (similar to examples). E.g., children object creation.
  • Deployment patterns: namespace isolation, deployment and RBAC-based service accounts
    (minimum rules needed for the framework, not the general recommendations), etc. See #17.
  • Getting starting (or quick-start) guide (isn't it an example/01-minimal?).
  • The requirement for **kwargs or **_ (for no-linting) for the forward-compatibility with the new keywords passed to the existing handlers by the new versions of the framework (part of the DSL).
  • The full list of all the available keywords now: body, meta, spec, status, patch, logger, retry, diff, old, new.
  • How the state is persistent in k8s itself, point out that the framework is stateless (same as the web servers).
  • What happens when the operator is killed during the handler, and how to avoid the duplicated side-effects (i.e. idempotent handlers).
  • What happens when the changes are applied when the operator is dead or fails permanently due to the bugs (the last-seen state preservation and comparison), i.e. that no changes are left behind unnoticed.
  • Strategies to isolate the operators to the namespaces; but still to have the cluster-wide operators (cluster CRDs or all-namespace monitoring, as e.g. Postgres operators, i.e. not already domain-specific ad-hoc solutions).
  • Alternatives: CoreOS Operator Framework, all Go-based solutions, etc). Differences and similarities, and the self-positioning of Kopf.

Implicitly map kinds<=>plurals<=>singulars

Currently, the resources' plural names must be used as the only acceptable reference to the resources in the handlers:

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn(**_):
    pass

It would be more convenient to refer to the resource the same way as the YAML files do: by their "kind":

@kopf.on.create('zalando.org', 'v1', 'KopfExample')
def create_fn(**_):
    pass

Or, as even more convenient way, as in kubectl — by any of their names without group/version references:

@kopf.on.create('KopfExample')  # kind
def create_fn(**_):
    pass

@kopf.on.create('kopfexample')  # singular
def create_fn(**_):
    pass

@kopf.on.create('kopfexamples')  # plural
def create_fn(**_):
    pass

Short-names can also be supported. Though it is questionable if they should be on the CRD reference in the operator.

Handler for the operator start/stop

To work properly with different Kubernetes client libraries, such as PyKube/PyKubeNG (#15), the authentication is needed.

One way to do this is on the module import level. Yet, this is wrong from the code structuring point of view.

It would be better to have the explicit event handler for the operator-wide (not just object-wide) events:

import kopf
import kubernetes
import pykube

config = None

@kopf.on.startup()
def auth_pykube(**_):
    global config
    try:
        config = pykube.KubeConfig.from_service_account()  # cluster env vars
    except FileNotFoundError:
        config = pykube.KubeConfig.from_file()  # developer's config files

@kopf.on.startup()
def auth_client(logger, **_):
    try:
        kubernetes.config.load_incluster_config()  # cluster env vars
        logger.debug("configured in cluster with service account")
    except kubernetes.config.ConfigException as e1:
        try:
            kubernetes.config.load_kube_config()  # developer's config files
            logger.debug("configured via kubeconfig file")
        except kubernetes.config.ConfigException as e2:
            raise Exception(f"Cannot authenticate neither in-cluster, nor via kubeconfig.")

@kopf.on.cleanup()
def purge_caches(**_):
    pass

It could be also used to start some background threads or async coroutines for monitoring the unrelated systems (e.g. API polling).

The handler should be called before any implicit login() or other API calls, so that nothing is implicitly configured by the framework. But after the logging is configured and registry is populated — so that the handler itself is logged, but it could adjust the logging, or the registry.

For the exit-handlers, perhaps the atexit stdlib can be used.

Support for built-in resources (pods, jobs, etc)

Expected Behaviour

Kopf should be able to handle the built-in resources, like pods, jobs, pvcs, etc. — Both in the spy-handlers (#30) and in the regular handlers (if decided so).

Actual Behaviour

Currently, Kopf is only able to serve the custom resources — because it uses the kubernetes.client.CustomObjectsApi API (for watching & for patching).

Suggestions

Maybe, pykube-ng can be beneficial here, as it has a nice syntax to operate on all the resources, built-in and custom. See #15. With the official Kubernetes client, we have to implement our own branching on the object kind, and using the proper class+methods for every kind — basically turning Kopf into a Kubernetes client library, which is not the goal.

Maybe, a pre-scan of the cluster would be needed to identify all the existing resources. See #57.

Specifications

kopf==0.13

Configurable field for the status storage

Actual Behavior

Currently, Kopf stores the internal status of the handlers in status.kopf (hard-coded). It is used for exchanging the information cross the event cycles.

This is done in the assumption that there is only one operator/controller per resource kind.

If two or more different Kopf-based operators/controllers are handling the resource, especially the reusable resource, such as Pods, they can collide.

Expected Behavior

The field of the internal status must be configurable. For example, in the handler declaration:

@kopf.on.update('', 'v1', 'pods', status='status.kopf.some-other-field')
def pod_updated(**_):
    pass

The controlling parts of the custom resource A can have a convention to use its own resource kinds as the field:

@kopf.on.update('', 'v1', 'pods', status='status.kopf-for-resource-a')
def pod_updated(**_):
    pass

If explicitly set to None, the status is not persisted. Which implies that a different flow should be used (all-at-once lifecycle, errors ignored):

@kopf.on.create('', 'v1', 'pods', status=None)
@kopf.on.update('', 'v1', 'pods', status=None)
@kopf.on.delete('', 'v1', 'pods', status=None)
def pod_event(**_):
    pass

PS: There is also the metadata.annotations.last-seen-state. It should be turned off when the status is turned off. It makes no sense to store the last-seen-state and to calculate the diff, since with the status not persisted, there will be no multiple handler calls.

Automated OLM generation and verification

Kopf-based operators can benefit from being compatible with Operator Lifecycle Manager (OLM). Quick intro video: https://youtu.be/nGM2s4-Qr74

For that, Kopf can help by generation of the YAML definitions based on the source code of the operator (e.g. utilised resources, operator's docstrings, setup.py if present).

Sample YAML files: https://github.com/operator-framework/operator-sdk-samples/tree/master/memcached-operator/deploy/olm-catalog/memcached-operator

This is also related to #49 (RBAC templates generation/verification).

This tasks needs research on how OLM works, some hands-on practice, before making any specific suggestions. EphemeralVolumeClaim operator looks like a good candidate, once finished in #97.

`metadata.generation` is reported in the diffs on post-creation

Expected Behavior

metadata.generation field is ignored in the last-seen state and diff calculation.

Actual Behavior

metadata.generation is reported as a normally changed field and triggers the update handlers immediately on creation — in Minikube demos, where the generation is increased on every patch.

[2019-05-17 12:32:07,152] kopf.reactor.handlin [INFO    ] [default/kopf-example-1] All handlers succeeded for creation.
[2019-05-17 12:32:07,173] kopf.reactor.handlin [DEBUG   ] [default/kopf-example-1] Patching with: {'status': {'kopf': {'progress': None}, 'create': {'field': 'item1, item2'}}, 'metadata': {'annotations': {'kopf.zalando.org/last-handled-configuration': '{"apiVersion": "zalando.org/v1", "kind": "KopfExample", "metadata": {"annotations": {}, "generation": 1, "labels": {"somelabel": "somevalue"}, "name": "kopf-example-1", "namespace": "default"}, "spec": {"duration": "1m", "field": "value", "items": ["item1", "item2"]}}'}}}
[2019-05-17 12:32:07,296] kopf.reactor.handlin [DEBUG   ] [default/kopf-example-1] Update event: (('change', ('metadata', 'generation'), 1, 2),)
[2019-05-17 12:32:07,297] kopf.reactor.handlin [INFO    ] [default/kopf-example-1] All handlers succeeded for update.

Specifications

  • Kubernetes version: Minikube + K8s 1.14.1
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.0", GitCommit:"0ed33881dc4355495f623c6f22e7dd0b7632b7c0", GitTreeState:"clean", BuildDate:"2018-09-28T15:20:58Z", GoVersion:"go1.11", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:02:58Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
  • Python packages installed: (use pip freeze --all)
kopf==0.10

Freeze for 2+ operators of the same priority

Current Behaviour

When 2+ operators with the same priority run (e.g. default of 0), both are working. Which breaks the purpose of the peering to prevent duplicated side-effects.

Expected Behaviour

When 2+ operators with the same priority run, with peering configured and functional, only one should be functioning, others should freeze.

The best case would be that all operators issue a warning and freeze, so that the cluster becomes not served anymore — this will bring people attention sooner than if they negotiate which one should work. 2+ operators with the same priority is obviously a deployment error.

Note: If the operators make a decision on which one runs, which one freezes (e.g. by the most recent pod start time, or by id, or anything else comparable), then it could happen that the old deployment in another namespace serves the cluster, when a new operator is expected to work. It is better to stop serving and bring people attention to resolve the situation.

Suggestion

See the module where it all happens.

Checklist

  • Implementation.
  • Tests — manually (there is no auto-tests for peering yet).
  • Documentation (in docs/peering.rst)

Ignore the events from the past

Actual Behaviour

Kopf-based operator reacts to the object creation events in some cases. In more details, it is described in kubernetes-client/python#819.

Briefly: it is caused by how a kubernetes client library is implemented: it remembers the last seen resource version among all objects as they are listed on the initial call. Kubernetes lists them in arbitrary order, so the old ones can be the latest in the list. Then, the client library uses that old resource version to re-establish the watch connection, which replays all the old events since that moment in time when this resource version was the latest. This also includes the creation, modification, and even the deletion events for the objects that do not exist anymore.

In practice, it means that the operator will call the handlers, which can potentially create the children objects and do some other side effects. In our case, it happened every day when some cluster events were executed.; but it could happen any time the existing watch connection is re-established.

Expected Behaviour

The operator framework should follow the "eventual consistency" principle, which means that only the last state (the latest resource version, the latest event) should be handled.

Since the events are streaming, the "batch of events" can be defined as a time-window of e.g. 0.1s — fast enough to not delay the reaction in normal cases, but slow enough to process all events happening in a row.

Steps to Reproduce the Problem

Create some amount (10-20) of objects.

Example for my custom resource kind:

In [59]: kubernetes.config.load_kube_config()  # developer's config files
In [60]: api = kubernetes.client.CustomObjectsApi()
In [61]: api_fn = api.list_cluster_custom_object
In [62]: w = kubernetes.watch.Watch()
In [63]: stream = w.stream(api_fn, 'example.com', 'v1', 'mycrds')
In [64]: for ev in stream: print((ev['type'], ev['object'].get('metadata', {}).get('name'), ev['object'].get('metadata', {}).get('resourceVersion'), ev['object'] if ev['type'] == 'ERROR' else None))

('ADDED', 'mycrd-20190328073027', '213646032', None)
('ADDED', 'mycrd-20190404073027', '222002640', None)
('ADDED', 'mycrd-20190408065731', '222002770', None)
('ADDED', 'mycrd-20190409073007', '222002799', None)
('ADDED', 'mycrd-20190410073012', '222070110', None)
('ADDED', 'mycrd-20190412073005', '223458915', None)
('ADDED', 'mycrd-20190416073028', '226128256', None)
('ADDED', 'mycrd-20190314165455', '233262799', None)
('ADDED', 'mycrd-20190315073002', '205552290', None)
('ADDED', 'mycrd-20190321073022', '209509389', None)
('ADDED', 'mycrd-20190322073027', '209915543', None)
('ADDED', 'mycrd-20190326073030', '212318823', None)
('ADDED', 'mycrd-20190402073005', '222002561', None)
('ADDED', 'mycrd-20190415154942', '225660142', None)
('ADDED', 'mycrd-20190419073010', '228579290', None)
('ADDED', 'mycrd-20190423073032', '232894099', None)
('ADDED', 'mycrd-20190424073015', '232894129', None)
('ADDED', 'mycrd-20190319073031', '207954735', None)
('ADDED', 'mycrd-20190403073019', '222002615', None)
('ADDED', 'mycrd-20190405073040', '222002719', None)
('ADDED', 'mycrd-20190415070301', '225374502', None)
('ADDED', 'mycrd-20190417073005', '226917625', None)
('ADDED', 'mycrd-20190418073023', '227736631', None)
('ADDED', 'mycrd-20190327073030', '212984265', None)
('ADDED', 'mycrd-20190422061326', '230661413', None)
('ADDED', 'mycrd-20190318070654', '207313230', None)
('ADDED', 'mycrd-20190401101414', '216222726', None)
('ADDED', 'mycrd-20190320073041', '208884644', None)
('ADDED', 'mycrd-20190326165718', '212611027', None)
('ADDED', 'mycrd-20190329073007', '214304201', None)
('ADDED', 'mycrd-20190325095839', '211712843', None)
('ADDED', 'mycrd-20190411073018', '223394843', None)
^C

Please note the random order of resource_versions. Depending on your luck and current state of the cluster, you can get either the new enough, or the oldest resource in the last line.

Let's use the latest resource_version 223394843 with a new watch object:

In [76]: w = kubernetes.watch.Watch()
In [79]: stream = w.stream(api_fn, 'example.com', 'v1', 'mycrds', resource_version='223394843')
In [80]: for ev in stream: print((ev['type'], ev['object'].get('metadata', {}).get('name'), ev['object'].get('metadata', {}).get('resourceVersion'), ev['object'] if ev['type'] == 'ERROR' else None))

('ERROR', None, None, {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 223394843 (226210031)', 'reason': 'Gone', 'code': 410})
('ERROR', None, None, {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 223394843 (226210031)', 'reason': 'Gone', 'code': 410})
('ERROR', None, None, {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 223394843 (226210031)', 'reason': 'Gone', 'code': 410})
('ERROR', None, None, {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 223394843 (226210031)', 'reason': 'Gone', 'code': 410})

……… repeated infinitely ………

Well, okay, let's try the recommended resource_version, which is at least known to the API:

In [83]: w = kubernetes.watch.Watch()
In [84]: stream = w.stream(api_fn, 'example.com', 'v1', 'mycrds', resource_version='226210031')
In [85]: for ev in stream: print((ev['type'], ev['object'].get('metadata', {}).get('name'), ev['object'].get('metadata', {}).get('resourceVersion'), ev['object'] if ev['type'] == 'ERROR' else None))

('ADDED', 'mycrd-expr1', '226370109', None)
('MODIFIED', 'mycrd-expr1', '226370111', None)
('MODIFIED', 'mycrd-expr1', '226370116', None)
('MODIFIED', 'mycrd-expr1', '226370127', None)
('MODIFIED', 'mycrd-expr1', '226370549', None)
('DELETED', 'mycrd-expr1', '226370553', None)
('ADDED', 'mycrd-20190417073005', '226917595', None)
('MODIFIED', 'mycrd-20190417073005', '226917597', None)
('MODIFIED', 'mycrd-20190417073005', '226917605', None)
('MODIFIED', 'mycrd-20190417073005', '226917614', None)
('MODIFIED', 'mycrd-20190417073005', '226917625', None)
('ADDED', 'mycrd-20190418073023', '227736612', None)
('MODIFIED', 'mycrd-20190418073023', '227736613', None)
('MODIFIED', 'mycrd-20190418073023', '227736618', None)
('MODIFIED', 'mycrd-20190418073023', '227736629', None)
('MODIFIED', 'mycrd-20190418073023', '227736631', None)
('ADDED', 'mycrd-20190419073010', '228579268', None)
('MODIFIED', 'mycrd-20190419073010', '228579269', None)
('MODIFIED', 'mycrd-20190419073010', '228579276', None)
('MODIFIED', 'mycrd-20190419073010', '228579286', None)
('MODIFIED', 'mycrd-20190419073010', '228579290', None)
('ADDED', 'mycrd-20190422061326', '230661394', None)
('MODIFIED', 'mycrd-20190422061326', '230661395', None)
('MODIFIED', 'mycrd-20190422061326', '230661399', None)
('MODIFIED', 'mycrd-20190422061326', '230661411', None)
('MODIFIED', 'mycrd-20190422061326', '230661413', None)
('ADDED', 'mycrd-20190423073032', '231459008', None)
('MODIFIED', 'mycrd-20190423073032', '231459009', None)
('MODIFIED', 'mycrd-20190423073032', '231459013', None)
('MODIFIED', 'mycrd-20190423073032', '231459025', None)
('MODIFIED', 'mycrd-20190423073032', '231459027', None)
('MODIFIED', 'mycrd-20190423073032', '232128498', None)
('MODIFIED', 'mycrd-20190423073032', '232128514', None)
('MODIFIED', 'mycrd-20190423073032', '232128518', None)
('ADDED', 'mycrd-20190424073015', '232198227', None)
('MODIFIED', 'mycrd-20190424073015', '232198228', None)
('MODIFIED', 'mycrd-20190424073015', '232198235', None)
('MODIFIED', 'mycrd-20190424073015', '232198247', None)
('MODIFIED', 'mycrd-20190424073015', '232198249', None)
('MODIFIED', 'mycrd-20190423073032', '232894049', None)
('MODIFIED', 'mycrd-20190423073032', '232894089', None)
('MODIFIED', 'mycrd-20190424073015', '232894093', None)
('MODIFIED', 'mycrd-20190423073032', '232894099', None)
('MODIFIED', 'mycrd-20190424073015', '232894119', None)
('MODIFIED', 'mycrd-20190424073015', '232894129', None)
('ADDED', 'mycrd-20190425073032', '232973618', None)
('MODIFIED', 'mycrd-20190425073032', '232973619', None)
('MODIFIED', 'mycrd-20190425073032', '232973624', None)
('MODIFIED', 'mycrd-20190425073032', '232973635', None)
('MODIFIED', 'mycrd-20190425073032', '232973638', None)
('MODIFIED', 'mycrd-20190314165455', '233190859', None)
('MODIFIED', 'mycrd-20190314165455', '233190861', None)
('MODIFIED', 'mycrd-20190314165455', '233254055', None)
('MODIFIED', 'mycrd-20190314165455', '233254057', None)
('MODIFIED', 'mycrd-20190314165455', '233262797', None)
('MODIFIED', 'mycrd-20190314165455', '233262799', None)
^C

All this is dumped immediately, nothing happens in the cluster during these operations. All these changes are old, i.e. not expected, as they were processed before doing list...().

Please note that even the deleted non-existing resource are yielded ("expr1").

Specifications

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.7", GitCommit:"6f482974b76db3f1e0f5d24605a9d1d38fad9a2b", GitTreeState:"clean", BuildDate:"2019-03-25T02:52:13Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.7", GitCommit:"6f482974b76db3f1e0f5d24605a9d1d38fad9a2b", GitTreeState:"clean", BuildDate:"2019-03-25T02:41:57Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

Python version:

Python 3.6.4

Python packages installed: (use pip freeze --all)

kubernetes==9.0.0
kopf==0.7

Log handling from pods

Having the silent handlers (spies) on the built-in Kubernetes objects (#30), the next step would be to silently watch over the pod's logs.

An example use-case: monitor the logs for specific lines (by pattern), and extract the KPIs of the process in them, or their status, which can then be put on the Kubernetes object's status:

import kopf

@kopf.on.log('', 'v1', 'pods',
             regex=r'model accuracy is (\d+\.\d+)%')
def accuracy_log(namespace, meta, patch, log, match, **kwargs):
    model_name = meta.get('labels', {}).get('model')
    accuracy = float(match.group(1))
    accuracy_str = f'{accuracy:2f}%'

    api = kubernetes.client.CustomObjectsApi()
    api.patch_namespaced_custom_object(
        group='zalando.org', 
        version='v1',
        plural='trainingjobs',
        namespace=namespace,
        name=model_name, 
        body={'status': {'accuracy': accuracy_str}},
    )

@kopf.on.log('', 'v1', 'pods',
             regex=r'Traceback (most recent call last):')
def error_log(namespace, meta, patch, log, match, **kwargs):
    model_name = meta.get('labels', {}).get('model')
    api = kubernetes.client.CustomObjectsApi()
    api.patch_namespaced_custom_object(
        group='zalando.org', 
        version='v1',
        plural='trainingjobs',
        namespace=namespace,
        name=model_name, 
        body={'status': {'training': 'FAILED'}},
    )

Important: Perhaps, some filtering by the labels is needed, so that we do not watch over all the pods (there can be a lot of them), but only those of our interest. E.g., by the presence of model label in the examples above, so that only the model-pods are taken into account. See #45.

Such a TrainingJob custom resource can the be defined as follows:

spec:
  ………
  additionalPrinterColumns:
    - name: Accuracy
      type: string
      priority: 0
      JSONPath: .spec.accuracy

When listed, the objects will print their accuracy:

$ kubectl get TrainingJob
NAME             ACCURACY
model-1          87.23%

Call handlers by time

Expected Behavior

Some of the handlers must be called regularly on the schedule (e.g. cron-like).

For example, to monitor the actual state of the system (as changed out of scope of the operator) with the declared state (as in the yaml files).

Actual Behavior

Only the objects changes cause the handlers execution.

The objects are not always changed when something else (unmonitored) happens in the cluster.

Do not show the stacktraces on the retry/fatal exceptions

The retry-/fatal-exceptions are the signals for the framework to stop handling or to retry the handler. They are not the regular "unexpected" errors, since they are expected.

As such, the stacktraces should not be printed. Yet the messages of the exceptions should be printed/logged.

For other regular exceptions, which are by design "unexpected", the stacktraces should be printed/logged as usually (similar as if they happen inside of a thread/greenlet, and the main process continues).

Add warning to docs about finalizers.

FYI. You may want to add a warning to the documentation about problems that can occur due to the way in which finalizers are used on a custom CRD. In particular, if the operator is deployed to a namespace and setup in standalone mode to monitor just the same namespace, if the operator is deleted before the custom CRD objects, then when the namespace is deleted, it will get stuck in terminating state.

This is especially problematic, because deletion of a project doesn't really guarantee any order of deletion where owner references and blocking conditions haven't been specified. Thus deletion of a namespace can result in the operator being deleted first, without there being a need to have explicitly delete the operator first to cause this issue. Even if the operator is deployed in a separate project, same issue can arise were the operator were separately deleted.

The only solution to this is to identify what the custom CRD object is that is blocking deletion of the namespace, and manually edit it to remove the finalizer. I don't know of any better solution at the moment, or how you could rework kopf to not be affected by the issue.

Namespace-scoped operators

Current Behaviour

Currently, --namespace= option is used only to limit the watch-event to the objects belonging to that namespace, and ignoring objects in other namespaces.

However, the framework continues to use the cluster-wide watching calls, and cluster-wide peering objects. The peering objects are now cluster-scoped by definition (in CRD), and there are no namespace-scoped peering objects at all.

Expected behaviour

As an operator developer, if I provide --namespace=something, I expect that the operator limits all its activities to that namespace only, and does not even request for the cluster-wide objects/queries — as they can be e.g. restricted by the permissions.

If I provide --namespace=something --peering=somepeering, I expect that the namespace-scoped peering object kind is used, not the cluster-scoped one.

Use-cases

The intended use-case 1: If an operator is a part of the application's deployment, and there are few instances of the same application deployed with different versions, but isolated by the namespaces. As a particular example: running a production, staging, and maybe experimental operators of the same application in different namespaces.

The intended use-case 2: Running in a cluster with strict RBAC rules, with no access to the cluster objects, restricted to one namespace only.

Steps to Reproduce the Problem

  1. Create a RBAC service account with only the namespace permissions.
  2. Deploy any of the example operators with --namespace=default (or any other namespace).
  3. Observe how it fails on api.list_cluster_custom_object() (in queueing.py/watching.py).
    • for kopfexamples
    • for kopfpeerings

Acceptance Criteria

  • Served objects:
    • with --namespace=, it uses the namespace-scoped watching api.list_namespaced_custom_object()
    • with no --namespace=, it uses the cluster-scoped watching api.list_cluster_custom_object()
  • Peering objects:
    • KopfPeering is separated to ClusterKopfPeering & KopfPeering or NamespacedKopfPeering (keep backward compatibility if possible, but align with the existing convention of Role/ClusterRole, etc.)
    • with --namespace=, the namespace-scoped peering object is used.
    • with no --namespace=, the cluster-scoped peering object is used.
    • --peering= option only specifies the peering name, but not the listing/watching mechanics.
  • Documentation:
    • RBAC configuration and deployment pattern.
    • CLI reference page.
    • Peering page.
    • A new page for cluster-vs-namespace separation ("Features" section).
  • Tests.

Filter by labels/annotations

Currently, all objects are watched (either globally or by namespace, see #32), and all objects are handled. This is a normal case for the operator that "owns" the handled objects.

For the cases when an operator spies on the objects tat it does not "own", such as pods (#30), or the log (#46 ), it should be able to filter out all objects definitely not in the scope of interest.

The easiest way is by labels, as it is supported by the Kubernetes API, and can be put into a query to be performed server-side. Also, filtering by annotations is probably possible — via the field-selectors.

Example usage:

import kopf

@kopf.on.event('', 'v1', 'pods', labels={'model': None, 'model-id': '123abc')
def model_pod_changed(**kwargs):
    pass

The same is for create/update/delete/field handlers, and, when implemented, for the event & log handlers.


Additionally, the label filtering should be accepted on the command line (same semantics as kubectl):

kopf run handlers.py -l experiment=expr1

That can be useful for development and debugging purposes, when it is not desired to put the labels to the code permanently.

Resource glob-matching

Background

In #75, it was highlighted that when the operator is dead, the custom resources cannot be deleted because of the Kopf's finalizer. This causes problems when the namespace is deleted: the operator deployment/pod can be killed before the custom resources are deleted, so there is nothing to remove the finalizers, so the resources are never released for actual deletion.

This can be partially helped with #24, when the finalizers are not added unless there are the deletion handlers. But this is not a full solution.

Use-case

Make it possible to write a cluster-scoped controller of this kind:

import kopf

@kopf.on.event('*', '*', '*')
@kopf.on.event('*', '*', 'kopfexamples')
@kopf.on.event('*.zalando.org', '*', '*')
def release_deleted(patch, meta, **kwargs):
    if 'deletionTimestamp' in meta and meta.get('finalizers'):
        patch.setdefault('metadata', {})['finalizers'] = None

This will watch ALL the resources in the cluster (or namespace), and use the spy-handlers (#30) to react. The spy-handlers do not produce any implicit patching, do not post the k8s events.

Task

Few things are missing now to achieve this:

  • The support for non-CRD resources (e.g. pods). #84
  • Pre-scanning of the existing resources or CRDs. #57
  • The resource glob-matching (this issue).

Optionally, it can be implemented in two steps: 1st, for all CRD resources by globs; 2nd, for all resources by globs. The 1st step should be forward-compatible with the 2nd one, so that the transition is smooth.

Related: other filters of the resources to serve (but of one definite kind): #58 #45

Checklist:

  • Tests.
  • Documentation.

Field-handler should receive field-diffs, not the object-diffs

Expected Behavior

The field-handler (@kopf.on.field) receives the diff object with the diffs specific for that field, same as the old & new values. And the field paths should be relative to the handled field.

Actual Behavior

The diff object contains the diffs for the whole object (all fields), and relative to the object's root.

Detected while writing the documentation (though was marked with a TODO in the source code).

does kopf support loop-until?

This isn't a bug.

I'm experimenting with moving some code over from metacontroller (https://metacontroller.app/) to kopf. Since metacontroller is webhook-based, it supports looping (e.g firing request on an interval to the webhook) until some attribute of the created/modified crd object is true (resyncAfterSeconds). This has worked really well, as it allows fairly stateless logic even for things that converge slowly (our use case is using CRDs to construct and invoke custom cloudformation stacks).

I'm trying to replicate the same behavior in kopf, but as far as I can see, there is no "loop-event" implemented.

I guess I'm looking for some guidance around:

  • Is there any sort of "event loop-like" structure in kopf that I can use? It would be awesome to avoid having to implement a scheduler like Celery etc
  • What is the recommended way of implementing crd handlers that have to wait for some external "thing" to converge (slowly)?
  • Is it considered "okay" to run blocking code in the kopf.on.create/kopf.on.modify event methods? I'm suspecting this would be problematic as handling multiple events would be bogged down (or does kopf handle this using multithreading or similar?)

in any case, thanks for a great project, kopf looks really promising.

Add the RBAC examples for deployments

Kopf is just a framework, but the Kopf-based operators must be deployed to the cluster. For that, they would need the RBAC (role-based access control) templates and examples.

Add and document some common templates with the RBAC objects (roles, rolebindings, etc).

Add health checks

Expected Behavior

The operator is restarted by Kubernetes if it becomes irresponsive.

Actual Behavior

The operator can stuck for any reason (e.g. bugs), and nobody will notice — except as by no reaction to the added/deleted objects.

Steps to Reproduce the Problem

  1. Put a synchronous time.sleep(300) anywhere in the async handler (async def ...).
  2. Let it run.
  3. Observe how the operator is blocked for 5 mins for all objects.

Operator freezes while exiting after an error in the watching/queueing cycle

Expected Behavior

When an error happens in the watching/queueing coroutines, the process exits, and the pod is restarted by Kubernetes (or it just exists if executed locally).

Actual Behavior

In some cases, the process freezes after the exception, and no new events are handled, nothing is logged.

Steps to Reproduce the Problem

Uncertain, but:

  1. Simulate an error in the watching cycle, e.g. such as #10

Filter by arbitrary callback function

In some cases, it might be needed that only a subset of resources is served and handled. Some of such filtering are suggested in #45 #58, but it can be anything arbitrary.

Without such filtering, the logs will be cluttered by the resource events/causes that are of no interest to the operator.

For this, a callback function can be specified on the handler declaration:

KNOWN_PODS = {}

@kopf.on.event('', 'v1', 'pods'
               when=lambda event, uid, **_:  event=='ADDED' and uid not in KNOWN_PODS)
def new_pod_created(**kwargs):
    pass

The callback should be called via the same invocation protocol as the handlers (see kopf.reactor.invocation), i.e. with all the handler's kwargs — be that an event-handler or a cause-handler (different sets of kwargs).

If the callback returns true, then the object is handled as normally. When it returns false, the handler is ignored, maybe the whole event is ignored (if there are no other handlers) — and therefore produces no single line of logs at all.

No additional kwargs are injected for such filtering.


Related:

  • #30 for silent handlers (no status/progress storage).
  • #84 for pods support (but can be tested with another custom resource).
  • #58 for filtering by owner references (aka parents).
  • #45 for filtering by labels/annotations.

Checklist:

  • Acceptance criteria:
    • Event handler registered with the filtering callback function/lambda/partial.
    • Callback is invoked every time an event is arrived and/or cause is detected.
    • Events/handlers are filtered by the result of this callback invocation.
    • No logs about handler or event/cause are produced is the handler was filtered out.
    • Minimum overhead and not extra calls if the callback is not specified (the default).
  • Tests.
  • Documentation.

kopf disallow importing modules on path

Given the following app structure:

╰─ cat app.py 
import kopf
import json

from MyMod.lib1 import Lib1
l = Lib1

@kopf.on.create('testing.com', 'v1', 'mycrd')
def create_fn(body, **kwargs):
    print(f'A handler is called with body: {json.dumps(body, indent=4, sort_keys=True)}')
╰─ ls MyMod 
__init__.py  lib1.py
╰─ cat MyMod/lib1.py 
class Lib1:
    def __init__():
        pass
    def doit(what):
        pass

In other words, in the app dir, I have a regular python module

If I execute the app.py file with regular Python, it just exits (as is expected), without throwing any error, meaning that all the imports work as expected.

However, if I run pipenv run kopf run app.py, it throws ModuleNotFoundError: No module named 'MyMod'.

Expected Behavior

apps run using kopf should be able to import local modules just fine

Actual Behavior

kopf throws (see above

Steps to Reproduce the Problem

in a blank dir:

pip install pipenv
PIPENV_VENV_IN_PROJECT=true pipenv install --dev --python python3.7

create files as described above
pipenv run kopf run app.py

Specifications

  • Platform:
  • Kubernetes version: (use kubectl version)
  • Python version: (use python --version)
  • Python packages installed: (use pip freeze --all)

Limit size of exception details in message field of event.

When an exception occurs, the full exception traceback is being added to the message field of an event. When this is posted to Kube, it result in an error as it can be too big to store.

HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-store', 'Content-Type': 'application/json', 'Dat
e': 'Thu, 30 May 2019 02:05:15 GMT', 'Content-Length': '392'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Event \"k
opf-event-5sqgc\" is invalid: message: Invalid value: \"\": can have at most 1024 characters","reason":"Inva
lid","details":{"name":"kopf-event-5sqgc","kind":"Event","causes":[{"reason":"FieldValueInvalid","message":"
Invalid value: \"\": can have at most 1024 characters","field":"message"}]},"code":422}

Should perhaps restrict what is used to lowest stack frame and exception description.

Support for Python 3.6?

What specific dependencies does the code have on Python 3.7?

Supported major Linux OS distributions such as Red Hat Enterprise Linux still ship Python 3.6 as latest version in most recent OS release, and will for some time. Requiring Python 3.7 will prevent organisations using this which need to have a supported operating system and language stack from a vendor.

Handler for starting the per-object background threads/coroutines

In some case, it is desired to have a background thread or async coroutine for every object existing in the cluster (if the operator developer decides so).

Example use-case: polling the external services & APIs when the custom resource represents the jobs of an external system with its status field.

It can be done from the @kopf.on.create() handler. However, when the operator restarts, and there are no events on the object, no handler will ever be called, meaning the operator will not react to the external state changes.

With this handler, the problem can be solved:

import kopf

@kopf.on.seen('zalando.org', 'v1', 'kopfexamples')
def seen_fn(meta, **_):
    thread = threading.Thread(target=monitoring_fn, args=(meta['name'],))
    thread.start()

Other names: @kopf.on.regain(), @kopf.on.restore().


UNDECIDED:

Option A:

The handler must be called only in the case when the operator restarts, gets a fresh list of the objects as part of the watch request, and notices there are some object already handled.

It must NOT be called when:

  • When the object is actually created while the operator is running ("first seen" or "create" causes).
  • When the operator starts and gets a fresh list of the objects as part of the watch request, and the objects are new and the @kopf.on.create handler should be called.

Options B:

  • When the object is fully handled (including the creation), and now needs the monitoring — both in cases of creation, and the operator restart.

JSON logging

Currently, Kopf logs in text mode, one line per event. In the multi-line events (e.g. with the data dumps), it is intentionally flattened to remain in one line — to be friendly with the logging systems such as Scalyr. This makes the log reading difficult.

Instead, Kopf should log the JSON objects per logging event, so that they could be consumed by the logging systems such as Scalyr, and delivered to the logging discovery tools with all these fields, where they are searchable/filterable.

kopf run --log-json ...

The fields needed:

  • Logging message.
  • Logging level as a number and as a name.
  • All other built-in fields of logging (timestamp, etc).
  • All extras, such as the namespace and name in the per-object loggers.
  • New: the uid and kind of the object.
  • New: the id of the operator (ourselves in peering).

When in the JSON logging mode, the data dumps of the objects should be made multi-line pretty-printed, so that they are readable in the logging tools (e.g. Scalyr).

Silent handlers (spies)

Currently, handlers are fully orchestrated by the framework, and their progress is stored in the object's status.kopf field. This limits the handlers only to the objects designed specially for this operator.

Sometimes, it is needed to watch after objects of different kind, or maybe even the built-in objects, such as pods, jobs, so on — e.g. created as the children of the main object. Putting the Kopf's progress field on these objects causes few problems:

  • Unnecessary changes & watch-events on those objects, as noticed by other controllers and operators.
  • Multiple Kopf-based operators will conflict with each other, as they use the same status subfield (see #23).

Kopf should provide a way to watch for the built-in objects silently:

import kopf

@kopf.on.event('', 'v1', 'pods')
def pod_changed(body, **kwargs):
    pass

This induced few limitations:

  • Progress is reported only in the logs, maybe in the k8s-events, but not in the status fields.
  • If the handler fails, there will be no retries as with the normal handlers. It can even miss the actual change and needed reaction in that case until the next event happens.
  • No cause detection will be provided (i.e. field diffs detection), only the raw events as they are sent by k8s.

This functionality is already present in the Kopf's reactor (the stream between the queueing & handling modules), so it makes sense to expose it as a feature.


Also, once done, add the missing docs for the sample problem: it should track when and if the PVCs are bound and finally unbound — so that the tutorial is indeed full and complete.


  • Silent handlers implemented.
  • Tests.
  • Docs on the feature in the "Handlers" section.
  • Tutorial extended for the PVC monitoring for being bound, and activating the deletion afterwards.

Kopf-based operator fails with KeyError ['uid']

Actual Behavior

There is a stacktrace in the operator written with Kopf:

Traceback (most recent call last):
  ………
  File "/usr/local/lib/python3.7/dist-packages/kopf/cli.py", line 19, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/kopf/cli.py", line 50, in run
    peering=peering,
  File "/usr/local/lib/python3.7/dist-packages/kopf/reactor/queueing.py", line 248, in run
    task.result()
  File "/usr/local/lib/python3.7/dist-packages/kopf/reactor/queueing.py", line 83, in watcher
    key = (resource, event['object']['metadata']['uid'])
KeyError: 'uid'

Expected Behaviour

No sporadic errors.

Steps to Reproduce the Problem

(unknown)

Specifications

  • Version: 0.5

Use --standalone option in the readme of the examples

Hi,

First of all thanks a lot for initiating this project, I think it will be a very useful project for the community.

So about the issue, I had some difficulties to run the examples as I didn't setup the peering before jumping to the examples.

Firstly I think it would be better to use --standalone parameter in the readme of the examples, so that users who haven't setup the peering yet won't get error in first run.

Secondly I guess returning a more explicit error when the peering is not setup and referring to the relevant doc makes sense, I spent some time to debug the code and find the reason and fixed it. then I noticed it's documented here :D
https://kopf.readthedocs.io/en/latest/install/

wdyt? I woud be more than happy to contribute to it if you think these suggestions are sensible.

Admission controllers for validating & mutating the resources

Add some way to valudate and mutate the resources immediately when they are created, i.e. before the Kubernetes API (kubectl) call finishes.

import kopf

@kopf.on.validate('zalando.org', 'v1', 'kopfexamples')
def check_body(spec, **_):
    if spec.get('field') is None:
        raise kopf.ValidationError("spec.field is required.")

@kopf.on.validate('zalando.org', 'v1', 'kopfexamples', field='spec.items')
def check_items(value, **_):
    if not all(item.startswith('item') for item in value):
        raise kopf.ValidationError("All items must start with 'item' text.")

If any of the validation handlers failed, the object creation should also fail.

Read more:

This implies that the operator reacts to the API calls, so the deployment pattern should be extended with the kind: Service objects, and the port listening on the pod. Related: #18 for the health-checks, with also require a listening port.

TODO: Need more info on how the admission controllers work (beside creating a kind: ValidatingWebhookConfiguration object).

Call handlers for cluster events

Expected Behavior

Some handlers should be called when the cluster events happen, such as the cluster upgrades.

Actual Behavior

Only the objects changes cause the handlers execution.

The objects are not always changed when something else (unmonitored) happens in the cluster or with the cluster.

See also #19

Configurable/optional finalizers

Actual Behavior

Finalizers are marks that are stored as a list of arbitrary strings in metadata.finalizers. When the object is requested to be deleted, its deletion timestamp is set only. Until such marks exist on the object, it will not be deleted, and the deletion request will wait until the marks are removed by the operators/controllers (which should react to the deletion timestamp appearance). Only when all the finalizers are removed, the object is actually deleted.

Currently, when a resource is handled by the operator, the finalizers are always added on the object on its first appearance (before the handlers), and remove when it is marked for deletion (after the handlers).

If such an operator is stopped, the objects cannot be deleted, and the deletion command freezes — while there is no actual need to wait and to notify the operator (it will do nothing).

Expected Behavior

The finalisers should be optional. If there are no deletion handlers, the finalizers are not needed. If there are deletion handlers, the finalizers should be added.

Some deletion handlers can be explicitly marked as optional, thus ignored for the decision whether the finalizers are needed. The default assumption is that if the deletion handler exists, it is required.

@kopf.on.delete('', 'v1', 'pods', optional=True)
def pod_deleted(**_):
    pass

Two special cases:

  • If the object was created when there were no deletion handlers, and the finalizers were not added, but then a new operator version is started with the deletion handlers — the finalizers must be auto-added.
  • If the object was created when there were some deletion handlers, and the finalizers were added, but then a new operator version is started with no deletion handlers — the finalizers must be auto-removed.

Auto-guessing the peering mode

Current Behaviour

Currently, the peering object is needed by default, unless --standalone option is used, which disables the peering completely.

This causes the confusion for the first intro and following the tutorial — in case the cluster is not configured yet (no peering objects created). See: #31.

If standalone mode is made the default, there is a negative side-effect: if somebody runs 2+ operators —e.g. one in-cluster, another in the dev-mode on the external workstations— these operators will collide and compete for the objects without knowing this. The peering was invented exactly for the purpose of not hitting this issue in the dev-mode, and gracefully "suppressing" other operators.

Expected Behaviour

The peering should be considered as a side-feature for extra safety, it should not be a showstopper for the quick-start guides or tutorials.

It would be better to have 3 modes:

  • with --peering or --peering=something, the peering is enforced, the operator fails to start if peering is not accessible (as it is now).
  • with --standalone, the peering is ignored (as it is now).
  • with no options (the new default), the auto-detection mode is used: if the "metadata.name: default" peering object is found, use it (either cluster-scoped or namespace-scoped, depending on --namespace=); if not found, log a big-letter warning of possible conflicts and collisions, and continue as if in the standalone mode.

Relevant: #32.

Todos:

  • Documentation:
    • CLI options.
    • Peering page.
  • Tests.

Measure test coverage

Extracted from #13, as it will take some time and experimentation.

  • Purge the experimental data in CodeCov.
  • Re-run the code coverage measurements for historic commits.
  • Configure the automatic code coverage with the unit tests (but not e2e tests!).

Filter by parent/owner type

For cross-object orchestration, it is needed that the parent object's operator is able to watch and react on the events on the children objects it creates, which are produces by the children objects' main operator — so that the parent operator could update the status of its own served parent objects accordingly.


The check can be performed by the metadata.ownerReferences, which generally defines the child-parent relationship. There are no reasons to introduce any other ways of marking the hierarchical relations (e.g. special labels/annotations, but see #45 ).

It should NOT react to any other object that it did not create, e.g. of those were created by other operators/controllers or manually — i.e. if there is NO ownerReference of the specified kind.

The individual objects (uids) should not be taken into account on the DSL level, and can be filtered in the handler code. Only the resource types relationships are important.

The parent information should be used to separate the handler progress storage instead of the default status.kopf field. Otherwise, the main operator of that resource will collide with the side-handlers of the parent operator.


Example syntax:

@kopf.on.delete('', 'v1', 'pods', 
                parent=('zalando.org', 'v1', 'kopfexamples)):
def child_deleted(body, parent, **_):
    child_name = body['metadata']['name']
    parent_name = parent['metadata']['name']
    api = kubernetes.client.CustomObjectsApi()
    api.patch_namespaced_custom_resource(
        ...,
       name=name,
       body={'status': {'children': {child_name: 'DONE'}}}
    )

Or we can introduce a convention to assume the same group/version for the related resources (e.g. here, it would be zalando.org/v1/parents):

@kopf.on.delete('zalando.org', 'v1', 'children', parent='parents'):
def child_deleted(body, parent, **_):
    ....

Related:

  • #30 for silent handlers (no status/progress storage).
  • #84 for pods support (but can be tested with another custom resource).
  • #57 for short notations of the resources (instead of plurals).
  • #45 for filtering by labels/annotations.
  • #98 for filtering by arbitrary callbacks.

Checklist:

  • Acceptance criteria:
    • Event handler registering with the parent resource declaration.
    • Events are filtered by the ownerReferences of the declared kinds.
    • parent kwarg injected for such handlers (can be None for regular handlers).
    • Relative resource group/version references are understood.
    • Parent information is used to separate the handler progress fields instead of status.kopf.
  • Tests.
  • Documentation.

Automated RBAC generation and verification

Background

With kopf>=0.9, the operators fail to start in the clusters with RBAC configured according to the docs. Introduced by #38, where GET is used on a specific peering object (not just on a list).

The deployment docs were not updated to reflect that. And, even if updated, that would lead to these incidents anyway, as the RBAC yaml file is not auto-validated and not auto-updated in our case, so we would not notice the change.

Suggestion: RBAC verification

Kopf should allow to verify if the RBAC yaml file matches the framework's and operator's expectations, and explain what is missing:

kopf rbac verify script1.py script2.py -f rbac.yaml

This verification step could be optionally used either in CI/CD testing stage, or in the docker build stage, and to fail the build if the source-code RBAC yaml file lacks some necessary permissions.

If no -f option is specified (OR: if --cluster is explicitly specified — TDB), then verify against the real currently authenticated cluster:

kopf rbac verify script1.py script2.py --cluster

The output should explain what is missing:

# Kopf's internals:
KopfPeering get permission: ❌absent
KopfPeering list permission: ✅present
KopfPeering watch permission: ✅present
KopfPeering patch permission: ✅present

# Used at script1.py::create_fn():
KopfExample list permission: ✅present
KopfExample watch permission: ✅present
KopfExample patch permission: ✅present

Some permissions are missing. The operator will fail to work.
Read more at https://kopf.readthedocs.io/en/stable/deployment/
Or use `kopf rbac generate --help`

Exit status should be 0 (all is okay) or 1 (something is missing), so that it could be used in CI/CD.

Suggestion: RBAC generation

Since Kopf would already contain the RBAC parsing & analysis logic, it is also fine to generate the RBAC yaml files from the codebase of the operator — based on which resources/events/causes are registered for handling (same CLI semantics as in kopf run: -m module or file.py).

kopf rbac generate script1.py script2.py > rbac.yaml
kubectl apply -f rbac.yaml

Extra: children objects introspection

As a challenge, some introspection might be needed into the internals of the handlers on which children objects they manipulate from the handlers (e.g. pod creation) — this must also be part of the RBAC docs. Or an additional decorator to declare these objects on the handler functions.

Acceptance Criteria

  • Implementation:
    • RBAC generation to stdout.
    • RBAC generation to file (-o, --output).
    • RBAC verification of stdin.
    • RBAC verification of file (-f, --file).
    • RBAC verification of cluster (--cluster).
    • Explanation of present/absent permissions.
    • Exit status on verification.
  • Documentation.
  • Tests:
    • CLI tests.
    • RBAC parsing tests.
    • RBAC verification tests.

required namespace positional argument missing error in walkthrough

Got to this part of the walkthrough
https://kopf.readthedocs.io/en/stable/walkthrough/creation/

and saw this error in the kopf ephemeral.py --verbose output:

[2019-05-27 18:33:56,232] kopf.reactor.handlin [DEBUG   ] [myproject/my-claim] Creation event: {'apiVersion': 'zalando.org/v1', 'kind': 'EphemeralVolumeClaim', 'kopf': {'dummy': '2019-05-27T17:33:31.599067'}, 'metadata': {'annotations': {'kubectl.kubernetes.io/last-applied-configuration': '{"apiVersion":"zalando.org/v1","kind":"EphemeralVolumeClaim","metadata":{"annotations":{},"name":"my-claim","namespace":"myproject"},"spec":{"size":"10G"}}\n'}, 'creationTimestamp': '2019-05-27T17:29:58Z', 'finalizers': ['KopfFinalizerMarker'], 'generation': 1, 'name': 'my-claim', 'namespace': 'myproject', 'resourceVersion': '776587', 'selfLink': '/apis/zalando.org/v1/namespaces/myproject/ephemeralvolumeclaims/my-claim', 'uid': '0cbcaf4b-80a5-11e9-a261-46c1de1eb0ee'}, 'spec': {'size': '10G'}, 'status': {'kopf': {'progress': {'create_fn': {'delayed': '2019-05-27T17:33:56.117633', 'retries': 4, 'started': '2019-05-27T17:29:55.703835'}}}}}
[2019-05-27 18:33:56,233] kopf.reactor.handlin [DEBUG   ] [myproject/my-claim] Invoking handler 'create_fn'.
[2019-05-27 18:33:56,233] kopf.reactor.handlin [ERROR   ] [myproject/my-claim] Handler 'create_fn' failed with an exception. Will retry.
Traceback (most recent call last):
  File "/Users/imiell/anaconda3/lib/python3.7/site-packages/kopf/reactor/handling.py", line 382, in _execute
    lifecycle=lifecycle,  # just a default for the sub-handlers, not used directly.
  File "/Users/imiell/anaconda3/lib/python3.7/site-packages/kopf/reactor/handling.py", line 478, in _call_handler
    **kwargs,
  File "/Users/imiell/anaconda3/lib/python3.7/site-packages/kopf/reactor/invocation.py", line 70, in invoke
    result = task.result()  # re-raises
  File "/Users/imiell/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
TypeError: create_fn() missing 1 required positional argument: 'namespace'

I was in a 'myproject' namespace as system:admin in a minishift 3.11 (sic) cluster - are there problems with running this against 1.11 k8s APIs?

Implicitly guess the current object being processed

When operating on other objects like this:

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn(body, **_):
    obj = {
        'apiVersion': 'v1',
        'kind': 'Pod',
        'spec': {...},
    }
    kopf.adopt(obj, owner=body)

The reference to thw owner=body is not needed, as the framework has the information on the object being processed, so it can use the current object by default:

    kopf.adopt(obj)

Only in the case when the owner is not the current object (e.g. 2-level ownership hierarchy), the owner can be explicitly specified in kwargs. This is an extremely rare and unusual case.

Hints

Similar to the contextvars used in kopf.reactor.handling. Perhaps, cause_var.body is already sufficient.

Checklist

  • Implementation:
    • kopf.adopt()
    • kopf.append_owner_reference()
    • kopf.remove_owner_reference()
  • Tests.
  • Docs:
  • Examples updated.

issue running the first example

I tried the first example and got:

sebair: 01-minimal (master)$ kopf run example.py --verbose
[2019-05-27 09:05:16,862] kopf.config          [DEBUG   ] configured via kubeconfig file
[2019-05-27 09:05:18,429] kopf.reactor.peering [WARNING ] Default peering object not found, falling back to the standalone mode.
[2019-05-27 09:05:38,900] kopf.reactor.handlin [DEBUG   ] [default/kopf-example-1] First appearance: {'apiVersion': 'zalando.org/v1', 'kind': 'KopfExample', 'metadata': {'annotations': {'kubectl.kubernetes.io/last-applied-configuration': '{"apiVersion":"zalando.org/v1","kind":"KopfExample","metadata":{"annotations":{},"labels":{"somelabel":"somevalue"},"name":"kopf-example-1","namespace":"default"},"spec":{"duration":"1m","field":"value","items":["item1","item2"]}}\n'}, 'creationTimestamp': '2019-05-27T07:05:35Z', 'generation': 1, 'labels': {'somelabel': 'somevalue'}, 'name': 'kopf-example-1', 'namespace': 'default', 'resourceVersion': '235142214', 'selfLink': '/apis/zalando.org/v1/namespaces/default/kopfexamples/kopf-example-1', 'uid': 'd359fea7-804d-11e9-b655-42010a80024c'}, 'spec': {'duration': '1m', 'field': 'value', 'items': ['item1', 'item2']}}
[2019-05-27 09:05:38,900] kopf.reactor.handlin [DEBUG   ] [default/kopf-example-1] Adding the finalizer, thus preventing the actual deletion.
[2019-05-27 09:05:38,900] kopf.reactor.handlin [DEBUG   ] [default/kopf-example-1] Patching with: {'metadata': {'finalizers': ['KopfFinalizerMarker']}}
[2019-05-27 09:05:39,528] kopf.reactor.handlin [DEBUG   ] [default/kopf-example-1] Creation event: {'apiVersion': 'zalando.org/v1', 'kind': 'KopfExample', 'metadata': {'annotations': {'kubectl.kubernetes.io/last-applied-configuration': '{"apiVersion":"zalando.org/v1","kind":"KopfExample","metadata":{"annotations":{},"labels":{"somelabel":"somevalue"},"name":"kopf-example-1","namespace":"default"},"spec":{"duration":"1m","field":"value","items":["item1","item2"]}}\n'}, 'creationTimestamp': '2019-05-27T07:05:35Z', 'finalizers': ['KopfFinalizerMarker'], 'generation': 1, 'labels': {'somelabel': 'somevalue'}, 'name': 'kopf-example-1', 'namespace': 'default', 'resourceVersion': '235142219', 'selfLink': '/apis/zalando.org/v1/namespaces/default/kopfexamples/kopf-example-1', 'uid': 'd359fea7-804d-11e9-b655-42010a80024c'}, 'spec': {'duration': '1m', 'field': 'value', 'items': ['item1', 'item2']}}
[2019-05-27 09:05:39,530] kopf.reactor.handlin [DEBUG   ] [default/kopf-example-1] Invoking handler 'create_fn'.
And here we are! Creating: {'duration': '1m', 'field': 'value', 'items': ['item1', 'item2']}
[2019-05-27 09:05:39,532] kopf.reactor.handlin [INFO    ] [default/kopf-example-1] Handler 'create_fn' succeeded.
[2019-05-27 09:05:40,025] kopf.reactor.queuein [ERROR   ] functools.partial(<function custom_object_handler at 0x11012bf28>, lifecycle=<function asap at 0x1121862f0>, registry=<kopf.reactor.registry.GlobalRegistry object at 0x112110668>, resource=Resource(group='zalando.org', version='v1', plural='kopfexamples'), freeze=<asyncio.locks.Event object at 0x111c49550 [unset]>) failed with an exception. Ignoring the event.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/queueing.py", line 126, in worker
    await handler(event=event)
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/handling.py", line 195, in custom_object_handler
    await execute(lifecycle=lifecycle, registry=registry, cause=cause)
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/handling.py", line 321, in execute
    cause=cause,
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/handling.py", line 421, in _execute
    events.info(cause.body, reason='Success', message=f"Handler {handler.id!r} succeeded.")
  File "/usr/local/lib/python3.7/site-packages/kopf/events.py", line 70, in info
    return event(obj, reason=reason, message=message, type='Normal')
  File "/usr/local/lib/python3.7/site-packages/kopf/events.py", line 63, in event
    body=body,
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/apis/events_v1beta1_api.py", line 60, in create_namespaced_event
    (data) = self.create_namespaced_event_with_http_info(namespace, body, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/apis/events_v1beta1_api.py", line 151, in create_namespaced_event_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 321, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 155, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 364, in request
    body=body)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 266, in POST
    body=body)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 222, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': '867460c4-dad8-4005-a70c-49b05a682d8b', 'Content-Type': 'text/plain; charset=utf-8', 'X-Content-Type-Options': 'nosniff', 'Date': 'Mon, 27 May 2019 07:05:37 GMT', 'Content-Length': '19'})
HTTP response body: 404 page not found

Consider pykube-ng?

Originally by @hjacobs :

I see that you are currently using the "official" Kubernetes client (Swagger Codegen).
I forked the old pykube to https://github.com/hjacobs/pykube as I'm rather unhappy about the complexity and size of the official Kubernetes client.
See also hjacobs/pykube#12
Not sure if this would even work or whether you use something specific of the Kubernetes Python client.

any instructions on how to install on a mac?

Expected Behavior

kopf binary installs

Actual Behavior

sh-3.2# pip install kopf
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Requirement already satisfied: kopf in /Library/Python/2.7/site-packages (0.0)
sh-3.2# kopf
sh: kopf: command not found
sh-3.2#

sh-3.2# find . -name kopf
./Library/Python/2.7/site-packages/kopf

Steps to Reproduce the Problem

Specifications

  • Platform: Mac

  • Kubernetes version: (use kubectl version)
    h-3.2# kubectl version
    Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-08T16:31:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}

  • Python version: Python 2.7.10

  • Python packages installed: (use pip freeze --all)

sh-3.2# pip freeze --all
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
altgraph==0.10.2
ansible==2.4.2.0
asn1crypto==0.24.0
bcrypt==3.1.4
bdist-mpkg==0.5.0
bonjour-py==0.3
boto==2.48.0
boto3==1.5.2
botocore==1.8.16
cffi==1.11.2
cryptography==2.1.4
Django==1.11.20
docutils==0.14
enum34==1.1.6
futures==3.2.0
idna==2.6
ipaddress==1.0.19
Jinja2==2.10
jmespath==0.9.3
kopf==0.0
macholib==1.5.1
MarkupSafe==1.0
matplotlib==1.3.1
modulegraph==0.10.4
numpy==1.8.0rc1
paramiko==2.4.0
pip==19.1.1
py2app==0.7.3
pyasn1==0.4.2
pycparser==2.18
PyNaCl==1.2.1
pyobjc-core==2.5.1
pyobjc-framework-Accounts==2.5.1
pyobjc-framework-AddressBook==2.5.1
pyobjc-framework-AppleScriptKit==2.5.1
pyobjc-framework-AppleScriptObjC==2.5.1
pyobjc-framework-Automator==2.5.1
pyobjc-framework-CFNetwork==2.5.1
pyobjc-framework-Cocoa==2.5.1
pyobjc-framework-Collaboration==2.5.1
pyobjc-framework-CoreData==2.5.1
pyobjc-framework-CoreLocation==2.5.1
pyobjc-framework-CoreText==2.5.1
pyobjc-framework-DictionaryServices==2.5.1
pyobjc-framework-EventKit==2.5.1
pyobjc-framework-ExceptionHandling==2.5.1
pyobjc-framework-FSEvents==2.5.1
pyobjc-framework-InputMethodKit==2.5.1
pyobjc-framework-InstallerPlugins==2.5.1
pyobjc-framework-InstantMessage==2.5.1
pyobjc-framework-LatentSemanticMapping==2.5.1
pyobjc-framework-LaunchServices==2.5.1
pyobjc-framework-Message==2.5.1
pyobjc-framework-OpenDirectory==2.5.1
pyobjc-framework-PreferencePanes==2.5.1
pyobjc-framework-PubSub==2.5.1
pyobjc-framework-QTKit==2.5.1
pyobjc-framework-Quartz==2.5.1
pyobjc-framework-ScreenSaver==2.5.1
pyobjc-framework-ScriptingBridge==2.5.1
pyobjc-framework-SearchKit==2.5.1
pyobjc-framework-ServiceManagement==2.5.1
pyobjc-framework-Social==2.5.1
pyobjc-framework-SyncServices==2.5.1
pyobjc-framework-SystemConfiguration==2.5.1
pyobjc-framework-WebKit==2.5.1
pyOpenSSL==0.13.1
pyparsing==2.0.1
python-dateutil==2.6.1
pytz==2013.7
PyYAML==3.12
s3transfer==0.1.12
scipy==0.13.0b1
setuptools==18.5
six==1.11.0
vboxapi==1.0
xattr==0.6.4
zope.interface==4.1.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.