GithubHelp home page GithubHelp logo

kubeflow / kfctl Goto Github PK

View Code? Open in Web Editor NEW
178.0 19.0 139.0 18.4 MB

kfctl is a CLI for deploying and managing Kubeflow

License: Apache License 2.0

Makefile 1.51% Go 81.43% Shell 0.23% Dockerfile 0.48% Python 16.36%

kfctl's Introduction

kfctl

kfctl is the control plane for deploying and managing Kubeflow. The primary mode of deployment is to use kfctl as a CLI with KFDef configurations for different Kubernetes flavours to deploy and manage Kubeflow. Please also look at the docs on Kubeflow website for deployments options for different cloud providers

Additionally, we have also introduced Kubeflow Operator in incubation mode, which apart from deploying Kubeflow, will perform additional functionalities around monitoring the deployment for consistency etc.

kfctl's People

Contributors

adrian555 avatar alexanderekdahl avatar amalts avatar andrebriggs avatar animeshsingh avatar bobgy avatar devgrok avatar discordianfish avatar evan-hataishi avatar gabrielwen avatar jeffwan avatar jinchihe avatar jlewi avatar jtherin avatar k8s-ci-robot avatar kkasravi avatar krishnadurai avatar kunmingg avatar moficodes avatar mrxinwang avatar naveensrinivasan avatar pinkavaj avatar richardsliu avatar shawnzhu avatar swiftdiaries avatar tabrizian avatar tomcli avatar vpavlin avatar xauthulei avatar zhenghuiwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kfctl's Issues

[GCP] Auto set GPU type for node auto provisioner based on zone

Cluster creation in GKE will fail if node-auto-provisioning is enabled and the zone doesn't have the GPUs listed in the node auto-provisioner available.

We should fix this so that kfctl picks good defaults.

The behavior should be as follows

  1. If autoprovisioning-config.enabled is set in the config file kfctl should respect it

  2. If [autoprovisioning-config.enabled] is not set then kfctl should infer sensible defaults

    • kfctl should check the zone to see if accelerators are available
    • If accelerators are available it should include them in the auto-provisioned pool
    • If not accelerators should be removed from the auto-provisioning settings

junit files in presubmits not copied to the right bucket

Our E2E tests for presubmits aren't copying the junit files to the write GCS location; which is why results aren't showing up in testgrid and spyglass.

Here's the command executed by copy-artifacts.

gsutil -m rsync -r /mnt/test-data-volume/kubeflow-presubmit-kfctl-go-iap-istio-4313-b6589cf-4149-fe75/output gs://kubeflow-ci_temp/pr-lo
gs/pull/kubeflow_kubeflow/4313/kubeflow-presubmit/None

This is the wrong subdirectory; it should not be None.

The correct directory for this PR should be

https://gcsweb.k8s.io/gcs/kubernetes-jenkins/pr-logs/pull/kubeflow_kubeflow/4313/kubeflow-presubmit/1184279996345094149/artifacts/

Related to kubeflow/testing#257

kfctl_test_delete is flaky and failures block upload of junit results

Opening this issue because we are seeing flakes related to kfctl_delete_test.py failing

We are seeing timeouts
kubeflow/manifests#493 (comment)

And failures due to concurrent operations
kubeflow/kubeflow#4287

Even worse because of kubeflow/testing#257 if kfctl_delete_test fails we don't copy junit artifacts to GCS so we lose signal about passing tests.

As I stop gap I going to mark kfctl_test_delete as expected to fail. At least this way we won't block uploading junit artifacts.

KfConfig should be private to coordinator.go

I think KFConfig should be considered an internal private data structure only accessible to the packages in
https://github.com/kubeflow/kubeflow/tree/master/bootstrap/pkg/kfapp

Code outside pkg/kfapp should not use KFConfig. Instead code outside of pkg/kfapp should use one of the versioned KFDef structures which is the public facing API.

The purpose of KFConfig is to provide an internal datastructure for the code in pkg/kfapp so that code can be written once but work with multiple versions of KFDef.

This is a common pattern when implementing an API; i.e. convert an external versioned data structure into an internal data structure that can evolve independently of the external API.

If code outside of kfapp uses KFConfig then it is no longer an internal datastructure it is a public API that other libraries could end up depending on.

Provide support for different cpu types and hardware accelerators

kfctl provides a fixed configuration for gpu's when using gcp under $KFAPP/gcp_config but doesn't provide for variations in the machineType, cpuType (Broadwell, Skylake), imageType (cos or ubuntu) or alternate accelerators as discussed here

There should be a generic way to add these resources to a cluster based on the cloud provider, perhaps in manifests

[GCP] Labels in KFDef not applied to GCP infrastructure

Labels in KFDef.metadata should be applied to the GCP infrastructure; i.e. the should be used as labels on the GCP deployments.

This was added here.
https://github.com/kubeflow/kubeflow/blob/0ba828a18ee97094a5aa24da2fe09ca75d05d15d/bootstrap/pkg/kfapp/gcp/gcp.go#L419

I think the problem is the converter
https://github.com/kubeflow/kubeflow/blob/0ba828a18ee97094a5aa24da2fe09ca75d05d15d/bootstrap/pkg/apis/apps/configconverters/v1beta1.go#L188

When we write to a file we aren't preserving the labels.

There's a related issue #53 to try to rely more on automation to ensure fields are preserved.

Why I think its not working:

I suspect this is broken though because our latest auto deployed clusters from masters don't have any labels.

Here's the input KFDef printed out by the deploy master job

INFO|2019-10-24T03:42:00|/src/kubeflow/testing/py/kubeflow/testing/create_unique_kf_instance.py|93| KFDefSpec:
apiVersion: kfdef.apps.kubeflow.org/v1beta1
kind: KfDef
metadata:
  labels: {GIT_LABEL: 0ba828a, PURPOSE: kf-test-cluster}
  namespace: kubeflow
spec:

kfctl apply -f doesn't work with relative path; can't rerun on app.yaml

Using commit: 6739cd076d8dd9911ce40b44137a98b2a68efcca

mkdir ${KFAPP}
cd ${KFAPP}
kfctl apply -f ./kfctl_gcp_iap.yaml 
Error: Cannot determine the object kind:  (kubeflow.error): Code 400 with message: could not fetch specified config ./kfctl_gcp_iap.yaml: relative paths require a module with a pwd

My expectation would be that it uses the directory of kfctl_gcp_iap.yaml

We have lots of bugs about the kfctl semantics; can we refactor the code to make it easy to write a unittest that verifies all the different use cases?

/cc @swiftdiaries

Create pytest to verify that KSA and GSA are set correctly for workload identity manager

In 0.7 we are enabling identity workload management on GKE.

As part of this when we create profiles we want to correctly bind GCP service accounts to KSAs.

kubeflow/kubeflow#3917 will update the profile controller to support this.

We also have code in kfctl to do this for the default namespace.

We should create a pytest that checks that the KSA and GSA for the newly created namespaces are correctly configured for workload identity.

We should then include this in a suitable E2E test for workload identity

kfctl is overly reliant on expensive E2E tests; slowing down development

kfctl is overly reliant on expensive E2E tests.

I'm seeing presubmits taking 50 minutes to run. Furthermore as these E2E tests become more comprehensive they inherently become more flaky.

I think we need to rethink our test strategy for kfctl to ensure velocity remains high.

  • For example, how can we develop a component test for kfctl that doesn't end up being a complete E2E test for kubeflow?

    • e.g. Does it really make sense when testing changes to kfctl to test that all the Kubeflow applications are deployed and healthy?

kustomize.go should modify the kustomization.yaml file

Right now kfctl overwrites the kustomization.yaml file
https://github.com/kubeflow/kubeflow/blob/6f6e3e93cbea1f268a8be9767ee24c2ef505bd2b/bootstrap/pkg/kfapp/kustomize/kustomize.go#L1102

Should we change this to instead support modifying a kustomization.yaml file.

Right now our KFDef files are duplicating a bunch of information; e.g. every application in every KFDef file needs to list common overlays like application.

Could we instead encode this common information in the kustomization.yaml file written directly in the kubeflow/manifests repo?

kfctl should use the file path passed to -f not app.yaml

Right now kfctl is hardcoding the path it saves the KFDef to as "app.yaml"

This leads to a confusing user experience; especially if a user needs to rerun kfctl

e.g. they would initially run

kfctl apply -f kfctl_gcp_iap.yaml

But none of the changes made by kfctl are preserved in kfctl_gcp_app.yaml; they are saved to app.yaml

So if a user needs to rerun they need to do

kfctl apply -f app.yaml

We should fix that.

Once kfctl is migrated to KFConfig (kubeflow/kubeflow#4250), I'm hoping this is is a straightforward fix. I think we can just add a field to KFConfig to store the appfile that gets set when the file is loaded.

The write methods can then get the path from that field.

Split kfctl_go_test.py into separate python functions for building and deploying kubeflow

Right now we have a single python file
https://github.com/kubeflow/kubeflow/blob/master/testing/kfctl/kfctl_go_test.py

That does two things

  1. Builds kfctl
  2. Deploys kfctl

We will probably want to split those into two separate python functions to make it more composable.

One use case for composability is to create an E2E test for upgradability (#35). In that E2E test we will want to build kfctl once but invoke kfctl twice; once to deploy and once to do the upgrade.

Another use case is to create E2E tests for other platforms configurations. For example we want an E2E test for installing Kubeflow on an existing cluster (kubeflow/kubeflow#3496).

In that case we need to provision a Kubernetes cluster before deploying Kubeflow; e.g. using kops.

Related to: #35 E2E test for kfctl upgrade

/cc @yanniszark @Jeffwan

small tweak to kfctl init usage msg: clarify local path for --config

I found that when passing a local file to the --config arg, I needed to use the full path (otherwise, got this error: relative paths require a module with a pwd). The usage info doesn't indicate that clearly enough. Instead of:

Usage:
  kfctl init <[path/]name> [flags]

Flags:
      --config string            Static config file to use. Can be either a local path or a URL.
                                 For example:
                                 --config=https://raw.githubusercontent.com/kubeflow/kubeflow/master/bootstrap/config/kfctl_platform_existing.yaml
                                 --config=kfctl_platform_gcp.yaml

we might want to show something like --config=/your/local/path/to/kfctl_platform_gcp.yaml
The repo README should be updated as well if so.

Rationalize all the constructor/public functions for coordinator.go

All of the public functions in coordinator.go for creating/loading KFApps are a bit of a mess

This is the result of a lot of accumulated tech debt.

This has led to problems because the logic is convoluted and different callers are skipping different parts of the logic.

Here are some initial ideas; some of these might be correct but hopefully this can help us accumulating more debt.

  • NewLoadKfAppFromURI - This should be the preferred public entrypoint for creating a KFAPp
    • Go style prefers methods named New
  • CreateKfAppCfgFile - This should be a private method
  • BuildKfAppFromURI - This method should be removed
    • Callers should be migrated to NewLoadKfAppFromURI
    • This code is creating a KFApp and then calling init/generate
    • This doesn't really make sense as public routine
      • Callers should just call NewLoadKfAppFromURI and then invoke generate on the returned
        app
  • LoadKfAppCfgFile
    • After various refactorings NewLoadKfAppFromURI is just a wrapper around LoadKfAppCfgFile
    • We should get rid of either LoadKFAppCfgFile or NewLoadKfAppFromURI and migrate
      callers to the other function
    • I vote for keeping NewLoadKfAppFromURI because I think its more common in go to call
      a function named New*

/cc @richardsliu @yanniszark @swiftdiaries @gabrielwen @lluunn @kkasravi @kunmingg

kfctl fails because directory not empty if run on kfctl apply

There is a bug in kfctl and currently if you do the following

mkdir ${KFAPP}
cd ${KFAPP} 
curl -L -o kfctl_gcp_iap.yaml https://raw.githubusercontent.com/kubeflow/manifests/master/kfdef/kfctl_gcp_iap.yaml
yq w -i kfctl_gcp_iap.yaml spec.plugins[0].spec.project ${PROJECT}
yq w -i kfctl_gcp_iap.yaml spec.plugins[0].spec.zone ${ZONE}
yq w -i kfctl_gcp_iap.yaml metadata.name ${NAME}
kfctl apply all -V -f kfctl_gcp_iap.yaml

It will fail because the directory ${KFAPP} is non empty. Similarly if you try to rerun kfctl apply from ${KFAPP} it will fail because the directory isn't empty.

The problem is in the implementation of how kfctl gets the appDir. The desired semantics of

kfctl apply -V -f ${KFDEF}

are

  • If the value of ${KFDEF} is a local file path; then we should use the directory of that file as the KFAPP directory regardless of whether it is empty

    • If its not empty we assume its the output of a previous run of kfctl aply
  • If the value of ${KFDEF} is a remote URI then we use the current working directory and check that it is empty.

This should be fixed by kubeflow/kubeflow#4115

[GCP] Workload identity not working; gke metadata server getting killed for failing liveness probe

I deployed 0.7.0 which uses GCP workload identity.

Per the instructions I tried to verify workload identity was working by starting a container

kubectl run -it   --generator=run-pod/v1   --image google/cloud-sdk   --serviceaccount kf-admin   --namespace kubeflow   workload-identity-test-admin2

I then run

gcloud auth list

Which should print out my service but doesn't most of the time. It succeeds intermittently; otherwise it print out

gcloud auth list

No credentialed accounts.

To login, run:
  $ gcloud auth login `ACCOUNT`

The GKE metadata server is crash looping

kubectl -n kube-system get pods
NAME                                                             READY   STATUS             RESTARTS   AGE
gke-metadata-server-4pq8w                                        0/1     CrashLoopBackOff   516        26h
gke-metadata-server-4vmjh                                        0/1     CrashLoopBackOff   519        26h

Because they are failing their liveness probe

Events:
  Type     Reason     Age                     From                                                          Message
  ----     ------     ----                    ----                                                          -------
  Warning  Unhealthy  57m (x1500 over 26h)    kubelet, gke-jlewi-v07-001-jlewi-v07-001-cpu-p-dedaa1a1-480h  Liveness probe failed: Get http://10.142.15.216:54898/healthz: dial tcp 10.142.15.216:54898: connect: connection refused
  Normal   Started    12m (x515 over 26h)     kubelet, gke-jlewi-v07-001-jlewi-v07-001-cpu-p-dedaa1a1-480h  Started container gke-metadata-server
  Warning  BackOff    2m33s (x6096 over 26h)  kubelet, gke-jlewi-v07-001-jlewi-v07-001-cpu-p-dedaa1a1-480h  Back-off restarting failed container

Attaching more complete logs from stackdriver

k8s_gke_logs.txt

kfctl should set name if not set based on the directory

If metadata.Name isn't set in KFDef spec isn't set then kfctl should try to infer it based on the directory of the config file.

This matches the semantics of 0.6 which used the directory name to infer name.

Furthermore this allows us to do something like this

mkdir ${KFAPP}
cd ${KFAPP}
kfctl apply -f https://.....

And get a deployment named ${KFAPP}

After we update kfctl we will need to update the KFDef files to remove Name so that we trigger this behavior.

Cannot build from master with Dockerfile

Steps:
clone kfctl
cd kfctl
docker build .

I get this error:

build github.com/kubeflow/kfctl/v3/cmd/kfctl: cannot load github.com/kubeflow/kubeflow/components/profile-controller/v2/pkg/apis/kubeflow/v1alpha1: cannot find module providing package github.com/kubeflow/kubeflow/components/profile-controller/v2/pkg/apis/kubeflow/v1alpha1
make: *** [Makefile:62: fmt] Error 1

I'm on commit 51703d5a77b371e4bc9f262f86caf549ae32a03b

I suspect it might be because its downloading the v0.6.2 release of kubeflow/kubeflow?

go: finding github.com/kubeflow/kubeflow v0.6.2
go: downloading github.com/kubeflow/kubeflow v0.6.2
go: extracting github.com/kubeflow/kubeflow v0.6.2

KFCTL 1.0 tracking

Requirements

Descriptions Category Status Issue
Upgradability Required WIP kubeflow/kubeflow#3727
New semantics for KFCTL (build/apply) Required DONE #19
Backward compatibility for KfDef Required DONE kubeflow/kubeflow#4002
Reconcile for KFCTL Recommended NOT_STARTED kubeflow/kubeflow#4056
Plugin injections Recommended NOT_STARTED kubeflow/kubeflow#3708
CI/CD Required WIP #24
Logging Recommended NOT_STARTED kubeflow/kubeflow#3671

New Semantics

We'd like to introduce new semantics kfctl build/apply: link

Tasks Category Status Issue
Add new flag -f Required DONE #20
-f should takes both local and remote files Required DONE #21
Implement build Required DONE #22
Implement apply Required DONE #23

Backward Compatibility for KfDef

Tasks Category Status Issue
Backward compatibility Required DONE kubeflow/kubeflow#4002

Reconcile for KFCTL

Tasks Category Status Issue
Implement Reconcile semantics Recommended NOT_STARTED kubeflow/kubeflow#4056

Plugin injections

kubeflow/kubeflow#3708

CI/CD

Tasks Category Status Issue
E2E Test Required DONE #25
Unit tests coverage Required Done #26
nightly builds Required NOT_STARTED #105
Testing on config files Recommended NOT_STARTED kubeflow/kubeflow#3688
Release process Required NOT_STARTED #27

Logging

kubeflow/kubeflow#3671

Docs

kfctl Design Doc

Presubmits are failing could not update deployment manager entries

Sample failure
https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubeflow_manifests/597/kubeflow-manifests-presubmit/1190107292830273536/

build_deploy fails with

util.py                     71 INFO     time="2019-11-01T03:52:34Z" level=info msg="Creating kfctl-52ee status: RUNNING (op = operation-1572579435113-596
40ae9c6b93-a575e434-b995b654)" filename="gcp/gcp.go:388"
util.py                     71 INFO     Error: failed to apply:  (kubeflow.error): Code 500 with message: coordinator Apply failed for gcp:  (kubeflow.er
ror): Code 400 with message: gcp apply could not update deployment manager Error could not update deployment manager entries; Creating kfctl-52ee did not
 succeed; status: RUNNING (op = operation-1572579435113-59640ae9c6b93-a575e434-b995b654)

Stop creating kf-user GCP secret in 0.8.0 and 1.0

Now that workload identity is enabled by default we should no longer have to install GCP secrets as K8s secrets in the cluster.

For 0.7 we continued to create the kf-user secret (see #46) for backwards compatibility.

We want to remove that in the next release since downloading secrets is unsafe.

We need to remove the secret creation in kfctl and profile controller.

checkout failing in postsubmits

Here's a run
https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/kubeflow_kubeflow/kubeflow-postsubmit/1190328119332966400

+ git clone --depth=2 https://github.com/kubeflow/kubeflow.git /mnt/test-data-volume/kubeflow-postsubmit-kfctl-go-iap-8dbde9d-6400-ab1a/src/kubeflow/kube
flow
Cloning into '/mnt/test-data-volume/kubeflow-postsubmit-kfctl-go-iap-8dbde9d-6400-ab1a/src/kubeflow/kubeflow'...
Checking out files:   7% (106/1359)
ing out files:  99% (1346/1359)
+ cd /mnt/test-data-volume/kubeflow-postsubmit-kfctl-go-iap-8dbde9d-6400-ab1a/src/kubeflow/kubeflow
+ '[' '!' -z ']'
+ git checkout 8dbde9d837ab4d4c9e0d3442adbee6d3a4055276
fatal: reference is not a tree: 8dbde9d837ab4d4c9e0d3442adbee6d3a4055276

It looks like this is not a valid commit.
Its using
https://github.com/kubeflow/kubeflow/blob/cfc6d8b1fe5901e699fbfd81d19d1967f71c3713/py/kubeflow/kubeflow/ci/kfctl_e2e_workflow.py#L458

The prow environment variables are

      - name: PULL_REFS
        value: v0.7-branch:8dbde9d837ab4d4c9e0d3442adbee6d3a4055276

So the issue is that the commit is on a branch. Since we are only cloning with depth=2 I'm guessing we are missing that branch

kfctl implements reconcile semantics

/kind feature

Why you need this feature:
See design doc for kfctl v1: kubeflow/kubeflow#3709

One of the proposal is for kfctl to have reconcile like implementation to deal with complex ordering of operations.

Filing this issue to track implementation.

Implements flag -f

Need to add a new flag -f and possibly deprecate flags regarding to app.yaml

Default profile with underscore in username is not DNS compliant

If a user's email address contains an underscore, it is not escaped and this is not DNS compliant for creating the default profile during deployment.

INFO[0334] creating Profile/kubeflow-john_smith         filename="kustomize/kustomize.go:447"
Error: couldn't apply KfApp:  (kubeflow.error): Code 500 with message: kfApp Apply failed for kustomize:  (kubeflow.error): Code 500 with message: couldn't create default profile from &{{Profile kubeflow.org/v1alpha1} {kubeflow-john_smith      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] nil [] } {{User  [email protected] }} { }} Error: Profile.kubeflow.org "kubeflow-john_smith" is invalid: metadata.name: Invalid value: "kubeflow-john_smith": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

Create new repo for kfctl

see references

Description

  • Move kfctl from under kubeflow/kubeflow/bootstrap/kfctl to a new repo.
  • Update references in the source code and go.mod

Related issues should be moved to the repo

Create kf-user secret for backwards compatibility even if workload identity is enabled

In 0.7.0 we are enabling workloadIdentity by default. Nonethless I think we should continue to create the secret kf-user in the Kubeflow namespaces and the profile namespaces.

e.g. in kfctl
https://github.com/kubeflow/kubeflow/blob/6739cd076d8dd9911ce40b44137a98b2a68efcca/bootstrap/pkg/kfapp/gcp/gcp.go#L1591

and in the profile controller

This is mainly to provide backwards compatibility with code that hasn't been updated yet to use workload identity.

I think we only need to create the kf-user secret and not the kf-admin secrets.

User code should only be using kf-user not kf-admin.

[Cleanup] Can KfDef converters just rely on YAML serialization?

The code for convertering KFConfig to a KFDef relies on custom conversion code
https://github.com/kubeflow/kubeflow/blob/27da9c8b1d7c44f7bd47ef7d516214909db36e3f/bootstrap/pkg/apis/apps/configconverters/v1beta1.go#L166

Could we instead just do the following?

I think Unmarshal will keep going but return a type error indicating any fields in the byte stream that aren't in the go struct. We should simply be able to ignore those.

This should simplify the conversion process and ensure the process is based on mapping fields of the same name and type.

/cc @gabrielwen @lluunn

basic auth test failing

The basic auth tests are failing.

22-5326/kfctl-2cc7/.cache/manifests exists; not resyncing " filename="kfconfig/types.go:460"
util.py                     71 INFO     time="2019-10-29T15:50:17Z" level=warning msg="Backfilling auth; this is deprecated; Auth should be explicitly se
t in Gcp plugin" filename="gcp/gcp.go:1931"
util.py                     71 INFO     time="2019-10-29T15:50:17Z" level=error msg="Could not configure basic auth; environment variable KUBEFLOW_USERNA
ME not set" filename="gcp/gcp.go:1938"

So looks like an issue with the test.

Kustomize Ordering: which ordering of resources should kfctl follow for kustomize?

Kustomize, by default, uses legacy ordering for ordering application of resources in a manifest for a resource.

https://github.com/kubernetes-sigs/kustomize/blob/a84f8d65db7be1209acef3433f9c6ab63044d8d3/api/resid/gvk.go#L82-L106

Likewise, kfctl adopted the same legacy ordering for resources:

https://github.com/kubeflow/kubeflow/blob/8a2c452d8576449e8459648ca24c4e2780d30f52/bootstrap/pkg/kfapp/kustomize/kustomize.go#L944-L947

This introduces few issues:

  1. When the dependency in order is inverted, i.e. when a resource depends on another resource to be already present to succeed in the application to Kubernetes, the resource application fails.
  2. When components like CRDs, ValidatingWebhookConfigurations and MutatingWebhookConfigurations validate, accept or modify resources, they need to be ready to process the incoming resource. Currently, we have fixed this behaviour through retries in the kfctl apply function of manifests.

Suggested alternatives are:

  1. Order the resources 'as-is' as mentioned in the kustomization files. This allows for the manifest definitions to be explicit about the ordering.
  2. Define and utilize the KindOrderTransform or KindFilterTransform to specify the order in which kfctl should apply resources. Refer to PR kubeflow/kubeflow#4347.

This is an uber issue to track the ordering in kustomization manifest application for kfctl.

/cc @jbrette @kkasravi @yanniszark @jlewi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.