GithubHelp home page GithubHelp logo

kubeflow / manifests Goto Github PK

View Code? Open in Web Editor NEW
772.0 772.0 836.0 49.6 MB

A repository for Kustomize manifests

License: Apache License 2.0

Shell 0.89% Makefile 0.08% Python 0.74% Smarty 0.01% YAML 98.28% JSON 0.01%

manifests's Introduction

OpenSSF Best Practices OpenSSF Scorecard CLOMonitor

Kubeflow the cloud-native platform for machine learning operations - pipelines, training and deployment.


Documentation

Please refer to the official docs at kubeflow.org.

Working Groups

The Kubeflow community is organized into working groups (WGs) with associated repositories, that focus on specific pieces of the ML platform.

Quick Links

Get Involved

Please refer to the Community page.

manifests's People

Contributors

adrian555 avatar andreyvelich avatar apo-ger avatar ashahba avatar bobgy avatar davidspek avatar dnplas avatar gabrielwen avatar hougangliu avatar jeffwan avatar jlewi avatar johnugeorge avatar juliusvonkohout avatar kimwnasptd avatar krishnadurai avatar kubeflow-bot avatar kunmingg avatar lampajr avatar lluunn avatar nickloukas avatar pvaneck avatar richardsliu avatar swiftdiaries avatar terrytangyuan avatar thesuperzapper avatar tomcli avatar ukclivecox avatar yanniszark avatar yuzisun avatar zhenghuiwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

manifests's Issues

Port spark operator manifest to kustomize

We have a ksonnet manifest for spark here:
https://github.com/kubeflow/kubeflow/tree/master/kubeflow/spark

@holdenk @rawkintrevo what are your thoughts on whether we should port this to kustomize or not?

If a user wants to install and use the spark-operator with Kubeflow it seems perfectly reasonable to point them at the i[nstructions] (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) for how to install it on a K8s cluster.

I think we only need a kustomize manifest if we want to be able to install spark operator by default as part of one/or more opinionated deployments of Kubeflow.

For example, if/when TFX works on spark, if we have a whole bunch of applications that we want to install in order to create a Kubeflow+Spark+TFX deployment. Then we need to figure out a a better story for how to compose applications. But right now I think we are still using Spark as an opptional add on.

Thoughts?

Thinking around kustomize layout

This is to explain the thought process around directory organization in the repo.

The kustomization.yaml at the root of the project contains the list of components with the variants that are to be deployed in the cluster.

A new user workflow would look like this:

git clone https://github.com/kubeflow/manifests
kustomize build | kubectl apply -f

For an advanced user, the goal is to allow for maximum flexibility.

The overlays inside each component allows an advanced user to define variants based on different environments (gke, onprem, docker-for-desktop..) and different jobs (pytorch-job, tf-job).
Although the initial commits have only environments for variants off of the bases, this can be done for different use-cases as well.
Each customization should then be listed in the kustomization.yaml at the root of the project. The user can then change the bases to reflect their overlays to build the customized manifests.

Tree for reference:

.
├── LICENSE
├── OWNERS
├── README.md
├── ambassador
│   ├── base
│   │   ├── ambassador-admin-service.yaml
│   │   ├── ambassador-clusterrole.yaml
│   │   ├── ambassador-clusterrolebinding.yaml
│   │   ├── ambassador-deployment.yaml
│   │   ├── ambassador-service.yaml
│   │   ├── ambassador-serviceaccount.yaml
│   │   └── kustomization.yaml
│   └── overlays
│       ├── docker-for-desktop
│       │   └── kustomization.yaml
│       ├── gke
│       │   └── kustomization.yaml
│       ├── minikube
│       │   └── kustomization.yaml
│       └── onprem
│           └── kustomization.yaml
├── argo
│   ├── base
│   │   ├── argo-clusterrole-binding.yaml
│   │   ├── argo-clusterrole.yaml
│   │   ├── argo-crd.yaml
│   │   ├── argo-sa.yaml
│   │   ├── argo-ui-clusterrole-binding.yaml
│   │   ├── argo-ui-clusterrole.yaml
│   │   ├── argo-ui-deployment.yaml
│   │   ├── argo-ui-sa.yaml
│   │   ├── argo-ui-service.yaml
│   │   ├── argo-wfc-configmap.yaml
│   │   ├── argo-wfc-deployment.yaml
│   │   └── kustomization.yaml
│   └── overlays
│       ├── docker-for-desktop
│       │   └── kustomization.yaml
│       ├── gke
│       │   └── kustomization.yaml
│       ├── minikube
│       │   └── kustomization.yaml
│       └── onprem
│           └── kustomization.yaml

/cc @kkasravi @jlewi

define naming and best practices for kustomize manifests

We should add a document that is linked from README.md that defines

  • what the kustomization file should look like
  • what each section does including
    • commonLabels
    • resources
  • when overlays are used
  • how multiple overlays are used and their constraints
  • naming of resources, groupings of resources if preferred

ISTIO related deployments are deployed to wrong namespace

Should be in istio-system, but it's deployed to kubeflow instead.

example:

Name:             iap-ingress-envoy-ingress
Namespace:        kubeflow
Address:
Default backend:  default-http-backend:80 (10.16.0.10:8080)
Rules:
  Host                                                        Path  Backends
  ----                                                        ----  --------
  kustomize-testing.endpoints.gabrielwen-learning.cloud.goog
                                                              /*   istio-ingressgateway:80 (<none>)
Annotations:
  certmanager.k8s.io/issuer:                    letsencrypt-prod
  ingress.kubernetes.io/ssl-redirect:           true
  kubernetes.io/ingress.global-static-ip-name:  kustomize-testing-ip
  kubernetes.io/tls-acme:                       true
Events:
  Type     Reason     Age                 From                     Message
  ----     ------     ----                ----                     -------
  Normal   ADD        16m                 loadbalancer-controller  kubeflow/iap-ingress-envoy-ingress
  Warning  Translate  59s (x20 over 16m)  loadbalancer-controller  error while evaluating the ingress spec: could not find service "kubeflow/istio-ingressgateway"

document missing kustomize targets that need to be ported from ksonnet

the list of ksonnet prototypes is

./application/prototypes/application.jsonnet
./argo/prototypes/argo.jsonnet
./automation/prototypes/release.jsonnet
./aws/prototypes/aws-alb-ingress-controller.jsonnet
./aws/prototypes/aws-efs-csi-driver.jsonnet
./aws/prototypes/aws-efs-pv.jsonnet
./aws/prototypes/aws-fsx-csi-driver.jsonnet
./aws/prototypes/aws-fsx-pv-dynamic.jsonnet
./aws/prototypes/aws-fsx-pv-static.jsonnet
./aws/prototypes/istio-ingress.jsonnet
./chainer-job/prototypes/chainer-job-simple.jsonnet
./chainer-job/prototypes/chainer-job.jsonnet
./chainer-job/prototypes/chainer-operator.jsonnet
./common/prototypes/ambassador.jsonnet
./common/prototypes/basic-auth.jsonnet
./common/prototypes/centraldashboard.jsonnet
./common/prototypes/echo-server.jsonnet
./common/prototypes/spartakus.jsonnet
./credentials-pod-preset/prototypes/gcp-credentials-pod-preset.jsonnet
./examples/prototypes/katib-studyjob-test-v1alpha1.jsonnet
./examples/prototypes/tensorboard.jsonnet
./examples/prototypes/tf-job-simple-v1beta1.jsonnet
./examples/prototypes/tf-job-simple-v1beta2.jsonnet
./examples/prototypes/tf-job-simple.jsonnet
./examples/prototypes/tf-serving-simple.jsonnet
./examples/prototypes/tf-serving-with-istio.jsonnet
./gcp/prototypes/basic-auth-ingress.jsonnet
./gcp/prototypes/cert-manager.jsonnet
./gcp/prototypes/cloud-endpoints.jsonnet
./gcp/prototypes/google-cloud-filestore-pv.jsonnet
./gcp/prototypes/gpu-driver.jsonnet
./gcp/prototypes/iap-ingress.jsonnet
./gcp/prototypes/metric-collector.jsonnet
./gcp/prototypes/prometheus.jsonnet
./gcp/prototypes/webhook.jsonnet
./jupyter/prototypes/jupyter-web-app.jsonnet
./jupyter/prototypes/jupyter.jsonnet
./jupyter/prototypes/notebook_controller.jsonnet
./jupyter/prototypes/notebooks.jsonnet
./jupyter/sync-notebook.jsonnet
./katib/prototypes/all.jsonnet
./knative-build/prototypes/knative-build.jsonnet
./kubebench/prototypes/kubebench-dashboard.jsonnet
./kubebench/prototypes/kubebench-job.jsonnet
./kubebench/prototypes/kubebench-operator.jsonnet
./metacontroller/prototypes/metacontroller.jsonnet
./modeldb/prototypes/modeldb.jsonnet
./mpi-job/prototypes/mpi-job-custom.jsonnet
./mpi-job/prototypes/mpi-job-simple.jsonnet
./mpi-job/prototypes/mpi-operator.jsonnet
./mxnet-job/prototypes/mxnet-job.jsonnet
./mxnet-job/prototypes/mxnet-operator.jsonnet
./new-package-stub/prototypes/newpackage.jsonnet
./nvidia-inference-server/prototypes/inference-server-all-features.jsonnet
./openvino/prototypes/openvino.jsonnet
./pachyderm/prototypes/pachyderm.jsonnet
./paddle-job/prototypes/paddle-job.jsonnet
./paddle-job/prototypes/paddle-operator.jsonnet
./pipeline/prototypes/pipeline.jsonnet
./profiles/prototypes/profiles.jsonnet
./profiles/sync-permission.jsonnet
./profiles/sync-profile.jsonnet
./pytorch-job/prototypes/pytorch-job.jsonnet
./pytorch-job/prototypes/pytorch-operator.jsonnet
./seldon/prototypes/abtest-v1alpha1.jsonnet
./seldon/prototypes/abtest-v1alpha2.jsonnet
./seldon/prototypes/core.jsonnet
./seldon/prototypes/mab-v1alpha1.jsonnet
./seldon/prototypes/mab-v1alpha2.jsonnet
./seldon/prototypes/outlier-detector-v1alpha1.jsonnet
./seldon/prototypes/outlier-detector-v1alpha2.jsonnet
./seldon/prototypes/serve-simple-v1alpha1.jsonnet
./seldon/prototypes/serve-simple-v1alpha2.jsonnet
./spark/prototypes/spark-job.jsonnet
./spark/prototypes/spark-operator.jsonnet
./tensorboard/prototypes/tensorboard-aws.jsonnet
./tensorboard/prototypes/tensorboard-gcp.jsonnet
./tf-batch-predict/prototypes/tf-batch-predict.jsonnet
./tf-serving/prototypes/tf-serving-all-features.jsonnet
./tf-serving/prototypes/tf-serving-aws.jsonnet
./tf-serving/prototypes/tf-serving-gcp.jsonnet
./tf-serving/prototypes/tf-serving-service.jsonnet
./tf-serving/prototypes/tf-serving-with-request-log.jsonnet
./tf-training/prototypes/tf-job-operator.jsonnet
./weaveflux/prototypes/weaveflux.jsonnet

the list of manifest targets (base|overlay) is

application/base
argo/base
common/basic-auth/base
common/ambassador/base
common/spartakus/base
common/centraldashboard/base
gcp/iap-ingress/overlays/gcp
gcp/gpu-driver/overlays/gcp
gcp/cert-manager/overlays/gcp
gcp/gcp-credentials-admission-webhook/overlays/gcp
gcp/cloud-endpoints/overlays/gcp
gcp/basic-auth-ingress/overlays/gcp
jupyter/jupyter-web-app/base
jupyter/notebook-controller/base
jupyter/jupyter/overlays/minikube
jupyter/jupyter/base
katib/base
kubebench/base
metacontroller/base
metadata/base
modeldb/base
mutating-webhook/overlays/add-label
mutating-webhook/base
pipeline/pipelines-runner/base
pipeline/api-service/base
pipeline/scheduledworkflow/base
pipeline/minio/base
pipeline/mysql/base
pipeline/pipelines-ui/base
pipeline/pipelines-viewer/base
pipeline/persistent-agent/base
profiles/overlays/devices
profiles/overlays/debug
profiles/base
pytorch-job/pytorch-operator/base
tensorboard/base
tf-training/tf-job-operator/base

it should be possible to determine what's missing

iap-ingress is missing parameters used in ksonnet

/manifests/gcp/iap-ingress/overlays/gcp is missing the parameters from ksonnet

ipName=
secretName=envoy-ingress-tls
hostname=
issuer=letsencrypt-prod
oauthSecretName=kubeflow-oauth
istioNamespace=istio-system
ingressName=envoy-ingress

test_harness: checkin the generated golang tests

the test_harness will create golang tests for all manifest targets.
Resources are embedded within the golang code as part of the generation.
This needs to be versioned so test/scripts/run-tests.sh will just run the
make test - comparing changes in resources (actual) with expected.

ml-pipeline api server crashes - missing flag MYSQL_SERVICE_HOST

I0521 20:31:07.780074       8 client_manager.go:119] Initializing client manager
F0521 20:31:07.780842       8 config.go:47] Please specify flag MYSQL_SERVICE_HOST
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc0002b6a00, 0xc000588c80, 0x53, 0x9b)
        external/com_github_golang_glog/glog.go:769 +0xd4
github.com/golang/glog.(*loggingT).output(0x2957260, 0xc000000003, 0xc00001cfd0, 0x2837d77, 0x9, 0x2f, 0x0)
        external/com_github_golang_glog/glog.go:720 +0x329
github.com/golang/glog.(*loggingT).printf(0x2957260, 0xc000000003, 0x1a91814, 0x16, 0xc0005419e0, 0x1, 0x1)
        external/com_github_golang_glog/glog.go:655 +0x14b
github.com/golang/glog.Fatalf(0x1a91814, 0x16, 0xc0005419e0, 0x1, 0x1)
        external/com_github_golang_glog/glog.go:1148 +0x67
main.getStringConfig(0x1a8c161, 0x12, 0x1a8d4d9, 0x13)
        backend/src/apiserver/config.go:47 +0xf8
main.initMysql(0xc00032943a, 0x5, 0x53d1ac1000, 0x0, 0x0)
        backend/src/apiserver/client_manager.go:222 +0x53
main.initDBClient(0x53d1ac1000, 0x15)
        backend/src/apiserver/client_manager.go:185 +0x5c0
main.(*ClientManager).init(0xc000541cd8)
        backend/src/apiserver/client_manager.go:121 +0x80
main.newClientManager(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        backend/src/apiserver/client_manager.go:292 +0x7b
main.main()
        backend/src/apiserver/main.go:56 +0x5e

run unit tests as a test overlay

kustomize plugins are explained in several places
plugins.md
doc.go

kustomize plugins will be used to generate unit tests for the targets under manifests.

unit tests should emulate kfctl which generates a kustomization.yaml at the root of the kustomize target and overrides namespace and application label.

Each manifest target should have an overlay/test which holds an app_test.yaml and kustomization.yaml. The kustomization.yaml will look like

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
generators:
- app_test.yaml

This will call a kustomize plugin hack/kustomize/plugin/apps.kubeflow.org/KfDef <path to app_test.yaml>
The plugin will generate a kustomization.yaml using values in the app_testl.yaml similar to what kfctl does. This kustomization will be what generates expected resources that are embedded in the unittests.

example - manifests/tensorboard has a base and overlays/istio. Kfctl will copy parts of overlays/istio/kustomization.yaml into tensorboard/kustomization.yaml to 'mixin' the overlay with the base kustomization.yaml. This kustomization.yaml under manifests/tensorboard is what is used to generate the tensorboard.yaml.

We want to generate kustomization.yaml in the test overlay using the same logic and call it to generate the output yaml and compare it with what was there before. This means

  1. convert hack/gen-test-target.sh to golang so we can have the generated unit test. the change is we don't do this for base and overlays/istio. we only generate the test for the kustomization.yaml under tensorboard.
  2. we need to call the same routines that kfctl calls from within the goplugin in order to do 1.
  3. we still need a generator and a test phase - the generator is one kustomization.yaml directly under the component.
  4. to create the generator we should turn gen-test-target.go into golang.

Change the Role verbs of a Profile user to prevent the user from getting a Profile

The Role should be

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  creationTimestamp: "2019-05-05T15:25:10Z"
  labels:
    app.kubernetes.io/name: experiments
    kustomize.component: profiles
  name: profiles-role
  namespace: experiments
  resourceVersion: "3645969"
  selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/experiments/roles/profiles-role
  uid: f8a2dd93-6f49-11e9-9bef-42010a8a01ef
rules:
- apiGroups:
  - kubeflow.org
  resources:
  - profiles
  verbs:
  - create
  - watch
  - list

how to customize a manifest by params

Take the example as here, whether--namespace and --enable-gang-scheduling are appended into args is based on two params.

For kustomize manifest counterpart, 4 overlays is need to cover all the cases of its ksonnet version.

As for more params, kustomize overlays may exponential increase if we try to match ksonnet version kubeflow.

I think we should discuss about how to handle it.

@jlewi @kkasravi @swiftdiaries

jupyter_test.go and profiles_test.go are replacing overlays and causing errors in run-tests

below is the output of run-tests under /manifests/tests/scripts

generating jupyter_test.go from /Users/kdkasrav/go/src/github.com/kubeflow/manifests/jupyter/jupyter/overlays/minikube
generating jupyter_test.go from /Users/kdkasrav/go/src/github.com/kubeflow/manifests/jupyter/jupyter/base
generating katib_test.go from /Users/kdkasrav/go/src/github.com/kubeflow/manifests/katib/base
generating kubebench_test.go from /Users/kdkasrav/go/src/github.com/kubeflow/manifests/kubebench/base

move gcp components from overlays/gcp to base

currently under manifests the gcp targets are organized as overlays

gcp
├── cert-manager
│   └── overlays
│       └── gcp
├── cloud-endpoints
│   └── overlays
│       └── gcp
├── gcp-credentials-admission-webhook
│   └── overlays
│       └── gcp
├── gpu-driver
│   └── overlays
│       └── gcp
└── iap-ingress
    └── overlays
        └── gcp

This is no longer necessary since we define the gcp overlay in bootstrap/config/overlays/gcp

the new directory layout should look like

.
├── cert-manager
│   ├── base
│   └── overlays
│       └── application
└── gcp
    ├── cloud-endpoints
    │   └── base
    ├── gcp-credentials-admission-webhook
    │   └── base
    ├── gpu-driver
    │   └── base
    └── iap-ingress
        └── base

extend the test harness to incorporate parameters and negative testing

currently the test harness is a pending PR #50 that generates golang test cases which compare the concatenation of all resources with the output of kustomize build. This has a few limitations:

  • the parameters most targets take should be passed into the target
  • the other parts of the kustomization file: vars, configMapGenerator, patchesStrategicMerge, patchesJson6902 shouldn't be ignored as be the basis of additional tests
  • negative testing should be done

Port weaveflux to kustomize?

We have a weaveflux ksonnet package here
https://github.com/kubeflow/kubeflow/tree/master/kubeflow/weaveflux

If a user wants to install and use weaveflux with Kubeflow it seems perfectly reasonable to point them at the i[nstructions] (https://github.com/weaveworks/flux) for how to install it on a K8s cluster.

I think we only need a kustomize manifest if we want to be able to install it by default as part of one/or more opinionated deployments of Kubeflow.

Thoughts?
@TimZaman

Seldon Core conversion

We'd like to convert the Seldon KSonnet components. Happy to do a PR for this. Any advice/docs on what you did for the existing components to convert them?

Move Application CR definition out of metacontroller package

Follow on to #13. Application controller was added to the metacontroller package.

Historically we were using the metacontroller as the application controller. With 0.6 I think the plan is to use the application controller provided by sig-apps.

So it probably makes sense to move the application cr out of the metacontroller package.

basic-auth-ingress is missing from gcp

from ksonnet gcp/prototypes/basic-auth-ingress.jsonnet

parameters are

namespace=
ipName=
hostname=
secretName=envoy-ingress-tls
ingressSetupImage=gcr.io/kubeflow-images-public/ingress-setup:latest
privateGKECluster=false
ingressName=envoy-ingress
issuer=letsencrypt-prod

unit tests should test for custom resources

An example is the CRD Notebook. The example in the README.md had incorrectly specified the securityContext and ttlSecondsAfterFinished. This should be caught within the unit tests where custom resources are included in the unittests as examples and their schemas can be checked

Include unit tests in CI

Unit tests are part of PR #34 but need to be added to prow for CI and should also include

  • testing of different parameters
  • negative testing

Rework kubebench manifests for v1alpha2

The manifests need to be reworked for kubebench v1alpha2.
Things to do:

  • cleanup unneeded manifests for old kubebench-job
  • add manifests for installing kubebench operator and corresponding rbac, etc.

Demo script just install TFJob - don't use kfctl

I think one of the overarching goals in Kubeflow is to make Kubeflow less monolithic and easier for people to install the particular applications they want e.g. TFJob.

I think a good way to evaluate our use of kustomize would be to see how easy it is to install just a particular CR.

I think a good exercise would be to create a doc with the instructions for installing just TFJob.

It might be nice to evaluate what this looks like using kfctl and without kfctl.

@kkasravi @hougangliu @swiftdiaries WDYT?

Should argo overlay just let users override the complete configmap rather than exposing all the individual parameters?

Follow on to #13
See comment

Here is the ConfigMap for Argo specified in #13
https://github.com/kubeflow/manifests/blob/a66ff97af0a505af086965f498917b1f5a3e2db4/argo/base/config-map.yaml

Every field in the config map is exposed as a parameter. These parameters are then set in the kustomization.yaml.

Instead of parameterizing the entire confimap should we just have the user directly edit the config map?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.