fluxcd / pkg Goto Github PK

View Code? Open in Web Editor NEW

45.0 9.0 84.0 4.27 MB

Toolkit common packages

Home Page: https://pkg.go.dev/github.com/fluxcd/pkg

License: Apache License 2.0

Makefile 0.33% Go 98.16% Dockerfile 0.06% Shell 0.32% Python 0.46% Ruby 0.05% HCL 0.61%

gitops-toolkit

pkg's Introduction

pkg

GitOps Toolkit common packages.

pkg's People

Contributors

Stargazers

Watchers

Forkers

chanwit idvoretskyi seaneagan devopstoday11 relu patricksplice mccodeman sylr clix-dev-llc somtochiama defreng firegrass chrisdevo nij4t glebiller alisondy makkes darkowlzz dmitrika jpellizzari allenporter adamkorcz oliverbaehler dieend tbondarchuk sbernheim pjbgf souleb aryan9600 nalum dmitri-lerko ilanpillemer nstogner isabella232 santosh1176 chengleqi manomanotech bnsfrt developer-guy nab-gha romitzz1 giantswarm mclarke47 laozc blurpy jack-evans weisdd cthtrifork jasonthedeveloper brexhq nce bburky eugenepaniot metalmatze errordeveloper chlunde genofire mihaiandreiratoiu rashedkvm vasylr haggishunk eliszbr huiwq1990 appscode-fluxcd knutgoetz danielwilsonkainos vishal-chdhry thebeginner86 gdasson funkymcb natefaerber matheuscscp gmolau gitchs dewey-typical kevinvalk garyhodgson bukington jamestupper dipti-pai reidmv skarlso arldka

pkg's Issues

Git Operations Test Server

The test coverage around git operations heavily relies on gitkit for self-contained tests. For E2E tests we also rely on directly accessing SaaS providers (e.g. GitHub) to ensure that our git implementation is consistently working across all out supported platforms.

Unfortunately, different SaaS providers support a different range of crypto algorithms, Git protocol versions, etc. Our upstream dependencies may introduce changes that may break specific features with said SaaS providers.

Instead of creating E2E tests for every single major SaaS provider, we could instead create or improving the existing git server to enable us to test features we want to support, and therefore quickly understand regression before releasing changes.

Below is a brain dump of specific features that are based on different issues that users have reported, and ideally a git test server would support:

HTTP specific settings
- Be able to switch ON/OFF support for HTTP/1.1 and HTTP/2.
- Reject Basic Auth when provided empty set of user/password.
- Algorithm specific settings
  - TLS Ciphers
Git protocol specific settings
SSH specific settings
- No simultaneous connections fluxcd/gitkit#1
- Short timeout connections fluxcd/gitkit#1
- Single operation connections
- Add random latency fluxcd/gitkit#6
- Algorithm specific settings fluxcd/gitkit#1
  - Key Exchange Algorithm
  - MACs
  - Ciphers
  - HostKey Algorithmn

These items should enable the test coverage to extend to support each one of the points above.

External event recorder should adhere to `EventRecorder` interface

The Kubernetes core has the following record.EventRecorder interface:

// EventRecorder knows how to record events on behalf of an EventSource.
type EventRecorder interface {
	// Event constructs an event from the given information and puts it in the queue for sending.
	// 'object' is the object this event is about. Event will make a reference-- or you may also
	// pass a reference to the object directly.
	// 'type' of this event, and can be one of Normal, Warning. New types could be added in future
	// 'reason' is the reason this event is generated. 'reason' should be short and unique; it
	// should be in UpperCamelCase format (starting with a capital letter). "reason" will be used
	// to automate handling of events, so imagine people writing switch statements to handle them.
	// You want to make that easy.
	// 'message' is intended to be human readable.
	//
	// The resulting event will be created in the same namespace as the reference object.
	Event(object runtime.Object, eventtype, reason, message string)

	// Eventf is just like Event, but with Sprintf for the message field.
	Eventf(object runtime.Object, eventtype, reason, messageFmt string, args ...interface{})

	// AnnotatedEventf is just like eventf, but with annotations attached
	AnnotatedEventf(object runtime.Object, annotations map[string]string, eventtype, reason, messageFmt string, args ...interface{})
}

As this interface includes everything we need, including adding "metadata" using AnnotatedEventf, it would likely be better if our own event.Recorder would adhere to the above interface.

As a second step, the controllers.Events helper could be refactored into a "DelegatingRecorder" which delegates events to all underlying recorder implementations.

Document runtime package and APIs, provide additional helpers

Goals:

Address long-standing technical debt
Add helper functions to solve common patterns in Flux controllers (introduces conditions and patch helper packages to runtime)
Document ALL runtime code and related API packages to a bare minimum in a hope to drive implementation efforts

Will be resolved by #101

#125

Move MergeMaps to pkg

As per https://github.com/fluxcd/flux2/blob/v0.8.0/internal/utils/utils.go#L403-L404, MergeMaps should be abstracted and moved to pkg.

What / where would be a good place for this function?

It will most likely be used by:

flux2
helm-controller
source-controller

MetricRecorder: Label Metrics with Resource Labels

Proposal:
I would like to group my Flux Metrics by labels or annotations (idc). The easiest way I could think of resolving that, was by labeling the metrics with the labels of the object the metric originates from.

So I guess we would have to simple label the metrics with the ref Objects labels here:

https://github.com/fluxcd/pkg/blob/main/runtime/metrics/recorder.go

This way I could create something like this

--- 
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
  name: backend
  namespace: tenant1-ns1
  labels:
    metric.flux.io/group: "system"
spec:
  interval: 5m
  path: "./webapp/backend/"
  prune: true
  serviceAccountName: "test"
  sourceRef:
    kind: GitRepository
    name: webapp
    namespace: tenant1-ns2

And then query it via the label over prometheus. What do you guys think? This would greatly improve the monitoring Experience for flux resources imho. I am happy to contribute the changes if you like the idea

Consolidate git implementations

The implementation of git operations are scattered around flux2, source-controller, image-automation-controller and this repository.
Resulting in code duplication in both core functionalities and their tests.

Consolidation steps:

Provide a Flux interface to abstract away the underlying git library being used. (Review/Extend on https://github.com/fluxcd/source-controller/blob/main/pkg/git/git.go)
Resolve the leak of implementation details
Consolidate git implementations in a shareable module.
Consolidate git test functions in a shareable module.
Replace existing implementations
- source-controller
- image-automation-controller

Adding Deploy Key for GitLab tries to use the wrong project

The list projects method actually does a search and naïvely grabs the first one it finds. e.g. if you have a repo called manifests it will execute a search like this https://gitlab.com/search?search=manifests and grab https://gitlab.com/evolution-x-sakura/local_manifests_sakura as the project to add the key to. This will fail for obvious reasons.

Support cross cloud federation

GCP supports authenticating from different trusted identities. One possible authentication story is a user wants to auth from AWS against a GCloud Container Registry. Currently the auth code only supports querying the internal metadata url via the const GCP_TOKEN_URL.

References:
https://cloud.google.com/iam/docs/configuring-workload-identity-federation
https://cloud.google.com/iam/docs/using-workload-identity-federation#aws_3

kustomize/filesys: Tests are failing on macOS

--- FAIL: Test_isSecurePath (0.00s)
    --- FAIL: Test_isSecurePath/relative_symlink (0.00s)
        fs_secure_test.go:592: 
            Expected success, but got an error:
                <*errors.errorString | 0xc0001f0ef0>: {
                    s: "path '/var/folders/fm/v67j6c7n1k5g46m38yb15yvm0000gn/T/securefs-Patwt/subdir/subsubdir/symlink' is not in or below '/var/folders/fm/v67j6c7n1k5g46m38yb15yvm0000gn/T/securefs-Patwt/subdir'",
                }
                path '/var/folders/fm/v67j6c7n1k5g46m38yb15yvm0000gn/T/securefs-Patwt/subdir/subsubdir/symlink' is not in or below '/var/folders/fm/v67j6c7n1k5g46m38yb15yvm0000gn/T/securefs-Patwt/subdir'
    --- FAIL: Test_isSecurePath/absolute_symlink (0.00s)
        fs_secure_test.go:592: 
            Expected success, but got an error:
                <*errors.errorString | 0xc0001f1120>: {
                    s: "path '/var/folders/fm/v67j6c7n1k5g46m38yb15yvm0000gn/T/securefs-Pilfs/subdir/subsubdir/symlink' is not in or below '/var/folders/fm/v67j6c7n1k5g46m38yb15yvm0000gn/T/securefs-Pilfs/subdir'",
                }
                path '/var/folders/fm/v67j6c7n1k5g46m38yb15yvm0000gn/T/securefs-Pilfs/subdir/subsubdir/symlink' is not in or below '/var/folders/fm/v67j6c7n1k5g46m38yb15yvm0000gn/T/securefs-Pilfs/subdir'

Add leader election config

Upgrade the runtime pkg to v0.10.1 and bind the leader election cmd flags:

runtime/conditions/check: Remove ObservedGeneration check

When using SerialPatcher the Reconciling condition will have ObservedGeneration=Generation so we need to remove the following check:

[Check-FAIL]: The root ObservedGeneration must be less than the Reconciling condition ObservedGeneration when Reconciling condition is True

Support using GitLab project tokens

When bootstrapping flux with GitLab, it needs to perform various actions on the GitLab instance:

Creating a repository (if it doesn't exist, yet)
Adding a deployment key

During these operations, it will try to find the correct GitLab project using the GitLab API. With the current implementation, the search is restricted to return only groups with an access level >= 10.

When using a project access token in gitlab (from a repository that has already been manually created), the gitlab API will not list project parent groups if search is performed with access level >= 10. This will cause the bootstrapping to fail:

ubuntu@myhost:~$ flux bootstrap gitlab \
>     --owner=mygroup/mysubgroup \
>     --repository=myrepo

✗ failed to list projects, error: failed to find group named 'mygroup/mysubgroup'

The relevant code sections are:

Solution
The requirement on the access level can simply be omitted. The GitLab API will only list projects/groups the provided token can see.

runtime: Use server-side throttling

Flux should disable client-side throttling, when server-side throttling is enabled. We could adapt/make use of kubernetes-sigs/cli-utils#584 in pkg/runtime/client.

Add version parameter to GitHub actions

When using the pkg GitHub actions, we should able to override the binary version e.g.:

      - name: Setup Kustomize
        uses: fluxcd/pkg/actions/kustomize@master
        with:
          version: 3.8.0

git: Add Support for Clone No-op for Commit pinned clones

Commit pinned clones do not support the use of LastRevision to shortcirtuit the clone operation.
This means, specially for libgit2, that full clones are executed, causing higher resource consumption.

This sparkled as part of a cncf thread in which a user reported source controller hitting 6000% CPU saturation and 3600% memory saturation with the default resources requests/limits. That resource consumption was observed with 57 sources (most are pinned to a specific git hash).

The implementation could follow similar logic as checkout for branches:
https://github.com/fluxcd/source-controller/blob/3013b5fc3df274d790d017ef98596f3e95073dd9/pkg/git/libgit2/checkout.go#L103-L120

Send Elastic Common Schema Compliant Logs

Our EKS cluster runs flux aside a lot of other services that are sending logs in ECS Compliant format.
We face an issue where in because flux sends errors in text format, we are not able to ingest our service logs into Elasticsearch if a flux error log has already been ingested.
This is because ECS Compliant Error logs expect the error field to be an object. Once the mapping for an index has been set by a flux error log to be text, and other type for that field will result in an error while ingestion.

It would be nice to get an option to use Elastic Common Schema format for the logs.

Thanks.

ssa: ResourceManager.ApplyAll fails to apply ClusterRoleBinding

We've got 2 controllers inside 1 operator (kubebuilder setup) that:

create some structs using client-go's https://github.com/kubernetes/client-go/tree/master/applyconfigurations. Among those resources is ClusterRole and ClusterRoleBinding
convert them to *unstructured.Unstructured using "k8s.io/apimachinery/pkg/runtime".DefaultUnstructuredConverter.ToUnstructured
uses ssa.SetNativeKindsDefaults on list of those resources
tries to apply them using ResourceManager.ApplyAll

There are some clusters for which this operation fails, like [email protected]+rke2r1 and kind cluster created by [email protected], default image for this version so [email protected]. They fail with error message

{
	  "level": "error",
	  "ts": 1668516799.8819587,
	  "msg": "Reconciler error",
	  ...
	  "error": "failed to apply resources to the cluster: ClusterRoleBinding/crbName namespace not specified, error: clusterroles.rbac.authorization.k8s.io \"crName\" not found"
}

A bit of debugging led me to believe that the error comes from this line

pkg/ssa/manager_apply.go

Line 139 in 7741c5f

if err := m.dryRunApply(ctx, dryRunObject); err != nil {

which then goes through m.validationError(dryRunObject, err) and then gets returned here

pkg/ssa/manager_diff.go

Lines 184 to 186 in 7741c5f

 if apierrors.IsNotFound(err) { 

 return fmt.Errorf("%s namespace not specified, error: %w", FmtUnstructured(object), err) 

 }

I'm not really sure what's happening, I'll try to add some code in next comment that reproduces this issue.

We've encountered this problem twice, independently in my workplace. For now we've just changed usage of ResourceManager.ApplyAll to for loop that goes through slice of those resources, which we additionally sort using ssa.SortableUnstructureds, and apply them using ResourceManager.Apply.

Resource manager is created like that:

fieldManager:="some-string"
...
ResourceManager: ssa.NewResourceManager(mgr.GetClient(),
			nil, // poller is needed only for WaitForSet, we're not using it here
			ssa.Owner{
				Field: fieldManager,
				Group: fieldManager + "-group",
			}),
...

This seems like some kind of data-race, where ClusterRole is not yet fully "commited" and ClusterRoleBinding tries to reference no-yet-existing ClusterRole, but if I understand k8s it shouldn't happen 🙈. Weird this is that this error is not going away, subsequent reconciles do not succeed.

My setup:

[email protected], k8s 1.25.3
go 1.19.3
os: macos, intel based

Use reconcile annotation helpers in controllers

For each project that uses the reconcileAt annotation:

bump its version of pkg/runtime to 0.1.2 (this picks up the new Changed predicate):

go get github.com/fluxcd/pkg/[email protected]

ensure api/go.mod also has pkg/apis/meta v0.1.0 (this means you can use the status struct embed), with a similar go get
remove the LastHandledReconcileAt field from status and embed meta.ReconcileRequestStatus inline, instead:

type ThingStatus struct {
    // ...
    meta.ReconcileRequestStatus `json:",inline"`
}

run make manifests to update the CRD; it should not alter the schema, only a doc string
run make api-docs (if there is one) to update docs. This does make a material change.
if there's anywhere that refers to something.GetAnnotation()[meta.ReconcileAtAnnotation], change it to use meta.ReconcileAnnotationValue(something.GetAnnotations())
change something.Status.LastHandledReconcileAt to something.Status.GetLastHandledReconcileRequest(), and something.Status.LastHandledReconcileAt = now to something.Status.SetLastHandledReconcileRequest(now)

Projects:

image-reflector-controller -- fluxcd/image-reflector-controller#34 uses the helpers
~~image-automation-controller~~ (does not use reconcileAt, yet)
kustomize-controller (fluxcd/kustomize-controller#152)
~~source-controller~~ does not support reconcileAt or suspend
helm-controller (fluxcd/helm-controller#124)

Finally:

Once all of those have been updated and a minor version released, think about changing the annotation used by the command-line tool and the notification-controller, which both write the annotation. (EDIT: I think updating the CLI should wait for a minor version release, since it's a minor breaking change in the sense that the new CLI won't work with older controllers.)

Create new flag to set default hash function

The controllers use hash functions to generate checksum of artifacts, which is later used for change and anti-tampering detection.
Users would benefit by being able to set what algorithms to be used across the board, which align with their performance, security and regulation requirements.

The scope of this flag is only around checksums provided by Flux itself. It won't be able to change algorithms used for things defined elsewhere, for example, the algorithm used for Hash HostNames cannot be changed, as only SHA1 is supported.

Outstanding tasks

Runtime module `README.md` needs updating

It seems to be not fully up-to-date with recent changes, and we need to ensure that e.g. all flag related packages are listed, and it does not refer to removed (or changed) elements.

xref: https://github.com/fluxcd/pkg/blob/main/runtime/README.md

Make CI smarter about which tests to run depending on changes

Since this is a monorepo, it contains a lot of different sub packages/modules. At the moment, we run tests in CI for all packages regardless of which package is being modified. This leads to extremely long wait times and decreases developer velocity.
We should make the CI figure out which packages are affected by a PR and run tests only for relevant packages. This means running tests not only the modified package but also for packages that depend on the said modified package.

Use go workspaces

Go 1.18 introduced multi-module worksapces (ref: https://go.dev/doc/tutorial/workspaces). We should investigate and adapt this repo to make use of this feature.

Make it easier to send events without metadata

fluxcd/source-controller@26ebbe5 mentions:

  Passing `nil` every time there isn't any metadata for an event quickly
  becomes cumbersome; we should look into an `EventWithMetadata` and/or
  `EventfWithMetadata`, or some other way to _optionally_ provide
  metadata without annoying the consumer.

git: Create auto gitImplementation

Flux supports two git implementations: libgit2 and go-git. They do not have feature parity, which the two main problems being:

Shallow clones is only supported by go-git. Which results in less data being transmitted through the network at every clone operation.
Git V2 Protocol is only supported by libgit2. Servers that only support this protocol (e.g. Azure DevOps) won't work with go-git.

We should build a third git implementation type (a.k.a auto), which removes the need from users to understand the difference between the implementations, and always try to use whichever is the most optimal.

On a high-level, the new implementation could take into account:

Server fingerprint: query the SSH banner and HTTP headers.
Protocol Awareness: find ways to identify what Git protocol version is being used, which may be easier over HTTP scheme.

Unable to Pull Helm Charts from ECR Repositories

This is in response to /pull/362. That have helm charts in the root of ECR repos, and flux is unable to pull them.

The error code we now get: invalid chart reference: failed to get chart version for remote reference: could not get tags for "SERVICENAME": could not fetch tags for "oci://ACCOUNTNUMBER.dkr.ecr.REGION.amazonaws.com/SERVICENAME": GET "https://ACCOUNTNUMBER.dkr.ecr.REGION.amazonaws.com/v2/SERVICENAME/tags/list":` credential required for basic auth`

source-controller version: ghcr.io/fluxcd/source-controller:v0.32.1

This is the URL of our HelmRepository : url:oci://ACCOUNTNUMBER.dkr.ecr.REGION.amazonaws.com``

Complete error: {"level":"error","ts":"2022-12-09T20:46:20.623Z","msg":"invalid chart reference: failed to get chart version for remote reference: could not get tags for \"SERVICENAME\": could not fetch tags for \"oci://ACCOUNTNUMBER.dkr.ecr.REGION.amazonaws.com/SERVICENAME\": GET \"https://ACCOUNTNUMBER.dkr.ecr.REGION.amazonaws.com/v2/SERVICENAME/tags/list\": credential required for basic auth","name":"default-SERVICENAME","namespace":"flux-system","reconciler kind":"HelmChart","annotations":null,"error":"InvalidChartReference","stacktrace":"github.com/fluxcd/pkg/runtime/events.(*Recorder).AnnotatedEventf\n\tgithub.com/fluxcd/pkg/[email protected]/events/recorder.go:137\ngithub.com/fluxcd/pkg/runtime/events.(*Recorder).Eventf\n\tgithub.com/fluxcd/pkg/[email protected]/events/recorder.go:114\ngithub.com/fluxcd/source-controller/internal/reconcile/summarize.RecordContextualError\n\tgithub.com/fluxcd/source-controller/internal/reconcile/summarize/processor.go:56\ngithub.com/fluxcd/source-controller/internal/reconcile/summarize.(*Helper).SummarizeAndPatch\n\tgithub.com/fluxcd/source-controller/internal/reconcile/summarize/summary.go:193\ngithub.com/fluxcd/source-controller/controllers.(*HelmChartReconciler).Reconcile.func1\n\tgithub.com/fluxcd/source-controller/controllers/helmchart_controller.go:228\ngithub.com/fluxcd/source-controller/controllers.(*HelmChartReconciler).Reconcile\n\tgithub.com/fluxcd/source-controller/controllers/helmchart_controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}

We have been running flux successfully using a custom source controller built from this (1a4b57a) commit version.

Add test registry server

Source-controller and pkg/oci both set up a registry server for tests.
It would be great to have a package for this so it doesn't need to be set up in each repository.
The new pkg should also discard verbose logging as this can be distracting in tests as noted in @darkowlzz comment
#328 (comment)

The image-automation controller also uses a different test registry. It should be updated so the pkg is used everywhere

Non-blocking event recorder

The event recorder currently runs in a blocking fashion due to which it can affect the reconcilers when they emit an event and the event webhook server takes time to respond. The http client waits and retries on failure. This results in the reconciler to wait until the event is posted. Sometimes reconcilers time out waiting for the event emitting to complete, resulting in failed reconciliation. A non-blocking event recorder would help prevent the reconcilers to be affected by failure in event recording.

Following are some ideas to address this issue:

non-blocking request to the event webhook server

A quick and simple change to solve the immediate issue would be to make the http request to the event webhook server non-blocking by running it in a goroutine. This would unblock the reconciler but doesn't guarantee the ordering of the requests. If the webhook server is offline and the reconciliation is failing, the reconcilers retrying may create multiple of these goroutines which may keep retrying to post the event with back-off. Due to the variation in the back-off duration, when the webhook server becomes available, the events would be posted out of order. Since there's no de-duplication at the event source level, the webhook server will have to serve all the accumulated event requests and do de-duplication on its own. This may result in the creation of too many goroutines that are trying to do the same thing. But the goroutines can be configured to fail after certain number of attempts to ensure they get cleaned up.

per controller event processor

Another approach would be to introduce event processor per controller. The events package can provide some API to run an event processor, typically in the main.go file before setting up the reconcilers. The event recorder in the reconcilers would send event to the event processor through a buffered channel. The event processor would collect the events to be posted to the event webhook server, categorize them and process them based on certain strategies. Since all the events go through a central events processor, it can be used to add more functionalities in the event source. Order of the events can be maintained. Event de-duplication can be done at the source and spamming the event server can be avoided. If the event server isn't ready, the event processor can perform one health check and hold all the event processing. If the event buffer gets filled, it can drop certain events based on certain strategies. More interesting things can be done at the source of events centrally at the controller level.
This may be similar to the events notification broadcaster in kubernetes apimachiner https://github.com/kubernetes/apimachinery/blob/fd8a60496be5e4ce19d586738adb48ac6fa66ef9/pkg/watch/mux.go#L43 .

Some other variation of event processor could be to run event processor per reconciler or even per object and that'll create opportunities to handle the events in different ways. Like a tenant configuring the events related to their objects to be sent to their own event server which they manage.

Update kubebuilder to 3.1.0

We should update our GH action to kubebuilder 3.1.0 and use the new EnvTest binaries setup tool.

Add support for secure hash algorithms for known_hosts

The use of SHA1 is considered insecure. From a compliance perspective, this may also be an issue for flux users seeking to be "NIST-compliant".

pkg/ssh/knownhosts/knownhosts.go

Lines 415 to 419 in 0867f3f

 func hashHost(hostname string, salt []byte) []byte { 

 mac := hmac.New(sha1.New, salt) 

 mac.Write([]byte(hostname)) 

 return mac.Sum(nil) 

 }

Recommendation here would be to add support to secure algorithms such as SHA256 or SHA3.

runtime/client: Add tests for account impersonation

As a followup to #370, we need to add tests for runtime/client/impersonator.go by adapting the tests from kustomization_impersonation_test.

Review GPG touch-points with `GitRepository`

Review the current state ahead of Bootstrap GA, including:

SSH Key signing is not yet supported. As part of the review above we should consider whether the current implementation for signing and verification are extensible enough to enable the support of SSH Key signing. Implementing SSH Key signing is not part of the scope within the milestone of this issue.

Unhandled errors from deferred file close operations

From Ada Logics

Target
./source-controller/controllers/storage.go
./source-controller/pkg/sourceignore/sourceignore.go
./kustomize-controller/controllers/kustomization_impersonation.go
./kustomize-controller/internal/sops/pgp/keysource.go

Throughout the codebase there are places where file close operations are deferred within a function where a file is being written to, e.g:

localPath := s.LocalPath(*artifact)
f, err := os.Open(localPath)
if err != nil {
  return err
}
defer f.Close()

// untar the artifact
untarPath := filepath.Join(tmp, "unpack")
if _, err = untar.Untar(f, untarPath); err != nil {
  return err
}

This can lead to undefined behaviour since any errors returned by the f.Close() operation are ignored. This can have consequences in the event where a close operation failed and the data was not yet flushed to the file, which the rest of the code will assume it to be. For a detailed discussion on this, please see here.

Recommendation
Ensure that errors from f.Close() are handled.

request support for the non-blocking apply method

We use fluxcd to implement gitops, one application corresponds to one yaml. In use, we found that kustomize-controller will perform dry-run verification on each yaml when executing manifest apply. Once there is a problem with yaml, it will return an error directly. As a result, other normal yaml cannot be executed. See: github.com/fluxcd/pkg/ssa/manager_apply.go:125

We expect that an application yaml has a problem without affecting other application yaml to be applied。
my implementation：

runtime/conditions: take Condition generation into account during merge

At present the merge operation does not take the generation of conditions into account. To ensure the most relevant (and recent) condition is merged, it should be possible to take the generation recorded in the condition into account.

It could be possible to make this an opt-out (or opt-in) option using mergeOptions, see:

pkg/runtime/conditions/merge_strategies.go

Line 33 in 11abd07

type mergeOptions struct {

tar: Add TarGzip func

Following up from refactoring the untar pkg, we could add a .TarGzip func to tar-gzip directories.

Can we add a TarGzip function that takes a directory and generates a tgz file? We have that in the tests and I think it would be nice to offer both functions in this new package.
source: #221 (comment)

Depends on #221 being merged.

Move https://github.com/pjbgf/git2go to fluxcd/git2go

Create an official fork for git2go.

Tasks:

Copy changes from https://github.com/pjbgf/git2go
Point all references to fluxcd/git2go.

Improve OpenSSF Scorecard Score

"The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects."

As of 3rd January, fluxcd/pkg scores 6.2/10. For latest score check deps.dev or manually execute scorecard.

Areas to focus on:

Token-Permissions
Pinned-Dependencies
CII-Best-Practices
Fuzzing
- #212
- #150

Create workflow to build project on MacOS

Similar to the workflow created for source-controller, a new GitHub workflow could ensure that MacOS platform is always working for development purposes:

  # Runs 'make test' on macos-10.15 to assure development environment for 
  # contributors using MacOS.
  darwin-amd64:
    runs-on: macos-10.15
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Setup Go
        uses: actions/setup-go@v3
        with:
          go-version: 1.17.x
      - name: Restore Go cache
        uses: actions/cache@v3
        with:
          path: /home/runner/work/_temp/_github_home/go/pkg/mod
          key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
          restore-keys: |
            ${{ runner.os }}-go-
      - name: Run tests
        run: make test

source: https://github.com/fluxcd/source-controller/blob/3dcb1420760d8aa0dfb5d496910fd3f9576aeb8a/.github/workflows/e2e.yaml#L107-L126

Depends on: #275

Inconsistent code-styles and potential nil-dereferences

From Ada Logics

Flux is composed of projects across different repositories and there is often similar logic happening across the controllers but performed in quite different ways. This leads to a more complex overall codebase and can make it difficult to reason about properties of the code.

Event recording and checking status of similar elements in the controllers is performed differently. This came up as an issue through fuzzing. Each of the controllers rely on an EventRecoder, and the way these EventRecorder variables are used differs between the controllers. Some controllers check for nil-status and others do not. The HelmRelease reconciler and the Kustomization reconciler assume that the EventRecorder is not nil in their respective event() implementations, whereas the other controls do not:

Helm Release Reconciler

https://github.com/fluxcd/helm-controller/blob/e9d31e9f1f8df5149b10ce7719b2d272f617a44f/controllers/helmrelease_controller.go#L739-L745

Kustomize Reconciler

https://github.com/fluxcd/kustomize-controller/blob/72bc54477aa89aadc40d1444d0f30b1e9963806f/controllers/kustomization_controller.go#L788-L795

Image Update Automation Reconciler

https://github.com/fluxcd/image-automation-controller/blob/bc3d7b21121851ffe75ebb9c9dcd530c38db3d4e/controllers/imageupdateautomation_controller.go#L717-L720

Git Repository Reconciler

https://github.com/fluxcd/source-controller/blob/c4d7e46b90dc48aac7d5c74def2a82e7b7ea9333/controllers/gitrepository_controller.go#L427-L432

Recommendation
The same code pattern should be used across the controllers. Through our analysis we determined the EventRecorder cannot be nil using the current main.go files and thus the nil check should be removed.

Document abstract outline of package in `fluxcd/pkg/runtime` directory

Follow up to #117

This is a top level README in fluxcd/pkg/runtime.

This should be a markdown version in the package as a GitHub landing page, which outlines and glues together the in-depth directory contents, but doesn’t give any advice on how to use them (that will be added separately to the website). To show users code examples, link to inline code examples in the go documentation. The idea is for this to be a lighter version of what we will put on the website, in the context of runtime package only.

Contains:

Description (in go doc directory)
A more human description of what’s written in the package descriptions
Intro
List of things that are in there
Generic list of subjects (standards, or other things to take into account if you make use of the runtime package or contributing to Flux)
When we mention our standards and API conventions, we can link to the packages that help with those standards

/assign @hiddeco

git: Requirements to move back to `go-git/go-git`

The go-git/go-git dependency has recently been replaced with fluxcd/go-git`. In order to be able to move back to the upstream project, we need:

Upstream performance improvements

Upstream behaviour changes

go-git/go-git: Surface a way for users to change unsupported capabilities.
- go-git/go-git#613
skeema/knownhosts and/or go-git/go-git: Add support for auto-populate HostKeyAlgorithms for non file-based knownhosts.
- go-git/go-git#548 broke backwards compatibility. fluxcd/go-git reverts some of its changes so that the latest versions of go-git/go-git continue to work with Flux.
- The auto-population of HostKeyAlgorithms is a great feature to have and will decrease the likelihood of issues such as:
  - fluxcd/flux2#1595
Add support for env proxying when using HTTPS with custom CAs
- fluxcd/go-git#9

Support SSH cloning.

When using gitlab enterprise onprem, we don't allow ssh cloning and thus the bootstrapping fails to clone the repo.
If we provide --ssh-hostname the cloning still attempts over https.

could we enhance the cloning to force ssh url in the case that ssh is provided?

Mark events to prevent alerting spam

Taken from fluxcd/source-controller#411

To improve observability through events, without spamming the alerting system, I propose we introduce a new eventSeverity type called trace. The trace events are emitted as Kubernetes events under Normal, these events are dropped when forwarding to notification controller. Every source reconciliation should issue a single info event that contains the source URL and revision (later on we should consider adding the commit message too for GitRepositories).

[kustomize] Add a dry-run option to SubstituteVariables

We would like to add a dry-run parameter to the following func in pkg/kustomize:

func SubstituteVariables(
	ctx context.Context,
	kubeClient client.Client,
	kustomization unstructured.Unstructured,
	res *resource.Resource) (*resource.Resource, error){...}

It would become

func SubstituteVariables(
	ctx context.Context,
	kubeClient client.Client,
	kustomization unstructured.Unstructured,
	res *resource.Resource,
         dry-run bool) (*resource.Resource, error){...}

When the dry-run parameter resolve to true, we would replace the kubeAPI querys to retrieve substitutable objects by mocks.

We could then use it to enabled completely disconnected consumers (e.g flux cli in constrained environments) to use this function.

git: Test Coverage

Test coverage aimed across both git implementations pre-GA:

Standardise logging

From Ada Logics:

Proper logging is important to ensure audit trails in case of breaches. However, throughout the code we found inconsistency in the way logging is handled, and often when errors occur there would be no error logging.

The ImageAutomationReconciler declares the following failWithError function:

https://github.com/fluxcd/image-automation-controller/blob/674f833d982628831c101a919c0154c64c455db3/controllers/imageupdateautomation_controller.go#L149-L157

This function is consistently used to return from the Reconcile function in an appropriate manner with proper logging, e.g:

https://github.com/fluxcd/image-automation-controller/blob/674f833d982628831c101a919c0154c64c455db3/controllers/imageupdateautomation_controller.go#L224-L227

This is a good way of abstracting of common logic and also ensures logging However, this theme is never used in any of the other controllers and these controllers implement the logic quite differently, e.g.

https://github.com/fluxcd/source-controller/blob/d7afc3596bdfc3818ed8987db029bea8a461c62c/controllers/gitrepository_controller.go#L175-L200

https://github.com/fluxcd/image-reflector-controller/blob/327a6ea9fd78783e4daaa2da3cc1f40dbcca0cab/controllers/imagepolicy_controller.go#L118-L131

https://github.com/fluxcd/notification-controller/blob/8ff5f75a255d7dd0677590510ab1abf1d33b4f85/controllers/alert_controller.go#L72-L79

Recommendation
● Standardise how logging should occur.
● Use helper methods for common error handling.
● Log whenever errors occur.
● Log differently depending on how each controller exits the Reconcile functions.

Dedicated module per package

The current setup solves the issue for now where we were copying over internal packages from one project to another to gain the same functionalities.

The new problem we face however is the fact that if you for example only want to make use of the ssh package, you are now automagically pulling in Git related dependencies because of the git package.

We should look into creating dedicated modules per package so that they can all have their own set of dependencies and release cycle.

author new artifact types for OCI operations

We (w/@Dentrax) thought that one of the main goals of the OCI Artifacts¹ is that it allows you to author new artifact types that you want to support within your client tooling.² The first step of writing a new artifact type is to define a unique type for it.³ Defining a unique artifact allows various tools to know how to work with the type uniquely. When writing this, Flux's pull and push operations rely on the first layer of an image.⁴ As we can use client tools for registry and image operations such as crane, skopeo, etc., we have to know that we should put things to the first layer of an image, so, in today's world, we can store different things within the image layers in addition to the tarball. So, if we define Flux's artifact types for OCI, we can search them within image layers. With that, we don't need to limit ourselves to only storing things at the first layer of an image.

For example, OPA Rego policies can be stored on the OCI registry with their own types⁵; also, we have planned to do the same in Kyverno CLI⁶.

Remove deprecated `ReconcilateAtChangedPredicate` from `runtime/predicates`

As suggested in #101 (comment)

Add trace log level

In fluxcd/image-automation-controller#190 I added some tracing output to the image automation controller. Since that controller uses pkg/runtime/logging, and that doesn't know about --log-level=trace, I had to put the tracing in the debug log level. It would be good to recognise the trace level, though.

parse --log-level=trace
add a const for the log level so it can be referred to in log.V(CONST).Info(...)

	if apierrors.IsNotFound(err) {
	return fmt.Errorf("%s namespace not specified, error: %w", FmtUnstructured(object), err)
	}

	func hashHost(hostname string, salt []byte) []byte {
	mac := hmac.New(sha1.New, salt)
	mac.Write([]byte(hostname))
	return mac.Sum(nil)
	}

fluxcd / pkg Goto Github PK

pkg's Introduction

pkg

pkg's People

Contributors

Stargazers

Watchers

Forkers

pkg's Issues

Outstanding tasks

non-blocking request to the event webhook server

per controller event processor

Footnotes

Recommend Projects

Recommend Topics

Recommend Org

Jobs