GithubHelp home page GithubHelp logo

kf5i / k3ai-plugins Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 5.0 31.84 MB

K3ai plugins Repo is the place where we host all the optional capabilites of k3ai. The main goal of the repo is to mantainer k3ai simple and lightweight while adding capabilites in the form of manifests or helm charts.

Home Page: https://docs.k3ai.in/

License: Apache License 2.0

Shell 100.00%
machine-learning datascience plugin-architecture cloudnative kubeflow inference pipelines k3ai-plugins mantainer-k3ai helm-charts

k3ai-plugins's People

Contributors

alefesta avatar alfsuse avatar gsantomaggio avatar harsimranmaan avatar k3aibot avatar kwwii avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

k3ai-plugins's Issues

[feat:] demo issue

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Additional context
Add any other context or screenshots about the feature request here.

[feat:] test actions

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Additional context
Add any other context or screenshots about the feature request here.

KFP SDK support

Is your feature request related to a problem? Please describe.
Based on kubeflow/website#2267 feedback it would be great to have a simple approach to kfp to help users to start

Describe the solution you'd like
A double approach may be to have a micro-container with:

  • Jupyter (as headless server)
  • KFP SDK
    So the user may connect to it remotely with its favorite IDE (as long it support remote connection to the container) and use -
  • Jupyter
  • Install a virtualenv for the user with kfp

Describe alternatives you've considered
User should install and configure KFP itself and we want simplicity not complexity

Additional context
https://www.kubeflow.org/docs/pipelines/sdk/install-sdk/

[feat:] Have Core vs Community plugin hierarchy

Is your feature request related to a problem? Please describe.
We should aim to have a two(2) separate list of plugins:

  • Core: plugins graduated by the k3ai contribution community team that respect certain standards
  • Community: any bundle/group or single plugin added by the community but not yet graduated

Describe the solution you'd like
A core plugin is something that is written in a way that satisfies at least three basic requirements:

  1. Support the current list of Clusters option of K3ai
  2. Support a single application or a specific bundle (matched to public known use cases)
  3. Is not a simple combination of existing plugins (see point 2)

A community plugin is everything that users describe as a bundle or single application and that represents a form of a proposal for the project itself to be later maintained and supported by the K3ai project.

Additional context
A process, detailed in the contributing guidelines, should take in consideration various options before graduate a community plugin to the core like:

  • is the plugin manifest using the best approach (ie helm vs. kustomize vs shell) ?
  • has the plugin a wide usage or request?
  • does the plugin solve a specific use case? Is it widely accepted as such?

[bug:] we have a bug, this is a test issue

Describe the bug
We have a bug!

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

[feat:] test

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Additional context
Add any other context or screenshots about the feature request here.

[feat:] labelling triage

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Additional context
Add any other context or screenshots about the feature request here.

[feat:] final test

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Additional context
Add any other context or screenshots about the feature request here.

Add support to Jupyter Notebooks

Is your feature request related to a problem? Please describe.
Original request from: https://github.com/kf5i/k3ai/issues/23 by @tinuxnet
We should offer Jupyter Notebooks and JupyterLab as part of the deployment.

Describe the solution you'd like
We have 3 main use cases:

  1. Single notebooks/lab as a headless server so people may connect to them from their own IDE's
  2. Single notebooks/lab as part of a single deployment so folks may use them directly if they don't want to use and external IDE
  3. Single notebooks/lab as part of a specific deployment to leverage some specific features (i.e.: notebook + kfp sdk)

Additional context
https://jupyter-notebook.readthedocs.io/en/stable/public_server.html

[feat:] test label

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Additional context
Add any other context or screenshots about the feature request here.

[feat:] demo issues on k3ai

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Additional context
Add any other context or screenshots about the feature request here.

[feat:] demo k3ai triage

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Additional context
Add any other context or screenshots about the feature request here.

[bug:] Katib currently not working

Describe the bug
Due to changes in the logic of the upstream Katib project the plugin actually is not able to execute any trial job.

Expected behavior
Experiments and Trials should appear immediately after.

Additional context
I'm working on V3 of k3ai so this will be fixed within that release.

[bug:] unsuccessful deployment of kubeflow-pipelines-traefik on current K3s

Describe the bug
Issues with kubeflow-pipelines-traefik on current K3s

To Reproduce
Steps to reproduce the behavior:
on running K3s node ( v1.20.2+k3s1 ) ... where simple jupyter-minimal-traefik works

k3ai apply -g kubeflow-pipelines-traefik
output ... (see 3 Warnings w/ v1beta1 context, that conflicts with 1.20 version)

Plugin YAML group-name: kubeflow-pipelines-traefik
Plugin YAML content: [{github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=1.3.0 kustomize} {github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=1.3.0 kustomize}], name: kubeflow-pipelines
serviceaccount "application" deleted
serviceaccount "argo" deleted
serviceaccount "kubeflow-pipelines-cache" deleted
serviceaccount "kubeflow-pipelines-container-builder" deleted
serviceaccount "kubeflow-pipelines-metadata-writer" deleted
serviceaccount "kubeflow-pipelines-viewer" deleted
serviceaccount "ml-pipeline-persistenceagent" deleted
serviceaccount "ml-pipeline-scheduledworkflow" deleted
serviceaccount "ml-pipeline-ui" deleted
serviceaccount "ml-pipeline-viewer-crd-service-account" deleted
serviceaccount "ml-pipeline-visualizationserver" deleted
serviceaccount "ml-pipeline" deleted
serviceaccount "pipeline-runner" deleted
role.rbac.authorization.k8s.io "application-manager-role" deleted
role.rbac.authorization.k8s.io "argo-role" deleted
role.rbac.authorization.k8s.io "kubeflow-pipelines-cache-deployer-role" deleted
role.rbac.authorization.k8s.io "kubeflow-pipelines-cache-role" deleted
role.rbac.authorization.k8s.io "kubeflow-pipelines-metadata-writer-role" deleted
role.rbac.authorization.k8s.io "ml-pipeline-persistenceagent-role" deleted
role.rbac.authorization.k8s.io "ml-pipeline-scheduledworkflow-role" deleted
role.rbac.authorization.k8s.io "ml-pipeline-ui" deleted
role.rbac.authorization.k8s.io "ml-pipeline-viewer-controller-role" deleted
role.rbac.authorization.k8s.io "ml-pipeline" deleted
role.rbac.authorization.k8s.io "pipeline-runner" deleted
rolebinding.rbac.authorization.k8s.io "application-manager-rolebinding" deleted
rolebinding.rbac.authorization.k8s.io "argo-binding" deleted
rolebinding.rbac.authorization.k8s.io "kubeflow-pipelines-cache-binding" deleted
rolebinding.rbac.authorization.k8s.io "kubeflow-pipelines-cache-deployer-rolebinding" deleted
rolebinding.rbac.authorization.k8s.io "kubeflow-pipelines-metadata-writer-binding" deleted
rolebinding.rbac.authorization.k8s.io "ml-pipeline-persistenceagent-binding" deleted
rolebinding.rbac.authorization.k8s.io "ml-pipeline-scheduledworkflow-binding" deleted
rolebinding.rbac.authorization.k8s.io "ml-pipeline-ui" deleted
rolebinding.rbac.authorization.k8s.io "ml-pipeline-viewer-crd-binding" deleted
rolebinding.rbac.authorization.k8s.io "ml-pipeline" deleted
rolebinding.rbac.authorization.k8s.io "pipeline-runner-binding" deleted
configmap "metadata-grpc-configmap" deleted
configmap "ml-pipeline-ui-configmap" deleted
configmap "pipeline-install-config-m2k6bmc5m7" deleted
configmap "workflow-controller-configmap" deleted
secret "mlpipeline-minio-artifact" deleted
secret "mysql-secret-fd5gktm75t" deleted
service "cache-server" deleted
service "controller-manager-service" deleted
service "metadata-envoy-service" deleted
service "metadata-grpc-service" deleted
service "minio-service" deleted
service "ml-pipeline-ui" deleted
service "ml-pipeline-visualizationserver" deleted
service "ml-pipeline" deleted
service "mysql" deleted
deployment.apps "cache-deployer-deployment" deleted
deployment.apps "cache-server" deleted
deployment.apps "controller-manager" deleted
deployment.apps "metadata-envoy-deployment" deleted
deployment.apps "metadata-grpc-deployment" deleted
deployment.apps "metadata-writer" deleted
deployment.apps "minio" deleted
deployment.apps "ml-pipeline-persistenceagent" deleted
deployment.apps "ml-pipeline-scheduledworkflow" deleted
deployment.apps "ml-pipeline-ui" deleted
deployment.apps "ml-pipeline-viewer-crd" deleted
deployment.apps "ml-pipeline-visualizationserver" deleted
deployment.apps "ml-pipeline" deleted
deployment.apps "mysql" deleted
deployment.apps "workflow-controller" deleted
application.app.k8s.io "pipeline" deleted
Error from server (NotFound): error when deleting "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=1.3.0": persistentvolumeclaims "minio-pvc" not found
Error from server (NotFound): error when deleting "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=1.3.0": persistentvolumeclaims "mysql-pv-claim" not found
2021/02/22 21:24:08 Error during delete: exit status 1
warning: deleting cluster-scoped resources, not scoped to the provided namespace
namespace "kubeflow" deleted
Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
customresourcedefinition.apiextensions.k8s.io "applications.app.k8s.io" deleted
customresourcedefinition.apiextensions.k8s.io "clusterworkflowtemplates.argoproj.io" deleted
customresourcedefinition.apiextensions.k8s.io "cronworkflows.argoproj.io" deleted
customresourcedefinition.apiextensions.k8s.io "scheduledworkflows.kubeflow.org" deleted
customresourcedefinition.apiextensions.k8s.io "viewers.kubeflow.org" deleted
customresourcedefinition.apiextensions.k8s.io "workflows.argoproj.io" deleted
customresourcedefinition.apiextensions.k8s.io "workflowtemplates.argoproj.io" deleted
serviceaccount "kubeflow-pipelines-cache-deployer-sa" deleted
clusterrole.rbac.authorization.k8s.io "kubeflow-pipelines-cache-deployer-clusterrole" deleted
clusterrolebinding.rbac.authorization.k8s.io "kubeflow-pipelines-cache-deployer-clusterrolebinding" deleted
Warning: networking.k8s.io/v1beta1 IngressClass is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 IngressClassList
ingressclass.networking.k8s.io "traefik-lb" deleted
Warning: networking.k8s.io/v1beta1 Ingress is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
Error from server (NotFound): error when deleting "https://raw.githubusercontent.com/kf5i/k3ai-plugins/main/common/traefik/kubeflow-pipeline-bind.yaml": ingresses.networking.k8s.io "pipeline-ingress" not found
2021/02/22 21:25:04 Error during delete: exit status 1

and then these containers (and sometimes more) never successfully start (and just perpetually CrashLoopBackOff)
metadata-grpc-deployment
ml-pipeline
ml-pipeline-persistenceagen
metadata-writer

Expected behavior
successful deployment

Screenshots
n/a

Additional context
Also, WRT to "k3ai --help" output, figured I could clone the "k3ai-plugins" repo and tweak the noted manifests that might contribute to warnings, but

  • k3ai list --repo http://localhost/k3ai-plugins/contents/core (seems like a poor example syntax, since "contents" is not in place)
  • k3ai list --repo http://localhost/k3ai-plugins/core (even with tweaked URI that seems to match repo artifacts)
    Can't load cache:error fetching plugins content: cannot load plugins: invalid character '<' looking for beginning of value will use remote
    cannot load plugins: invalid character '<' looking for beginning of value
  • k3ai list --repo https://github.com/kf5i/k3ai-plugins/core (same is true for trying to use this repo directly from CLI)
    Can't load cache:error fetching plugins content: cannot load plugins: invalid character 'N' looking for beginning of value will use remote
    cannot load plugins: invalid character '<' looking for beginning of value

PyTorch error when deploying

Describe the bug
After the deployment of PyTorch operator we observed the following error in the logs:
kubectl logs pytorch-operator-db5d78f97-b4tmx {"filename":"app/server.go:73","level":"info","msg":"EnvKubeflowNamespace not set, use default namespace","time":"2019-08-15T13:16:38Z"} {"filename":"app/server.go:78","level":"info","msg":"[API Version: v1 Version: v0.1.0-alpha Git SHA: Not provided. Go Version: go1.12 Go OS/Arch: linux/amd64]","time":"2019-08-15T13:16:38Z"} W0815 13:16:38.983656 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. {"filename":"pytorch-operator.v1/main.go:33","level":"info","msg":"Setting up client for monitoring on port: 8443","time":"2019-08-15T13:16:38Z"} {"filename":"app/server.go:200","level":"error","msg":"the server could not find the requested resource (get pytorchjobs.kubeflow.org)","time":"2019-08-15T13:16:38Z"} {"filename":"app/server.go:102","level":"info","msg":"CRD doesn't exist. Exiting","time":"2019-08-15T13:16:38Z"}

To Reproduce
Steps to reproduce the behavior:

  1. Deply PyTorch operator as per documentation
  2. Check the logs of the pod once is running

Expected behavior
We should observe no errors

Inference Servers (Epic)

We should allow a user to choose if he wants to install the pipelines only or an inference server and if so, if want to install it along with the pipelines or standalone to create an edge inference server.

We should use a tag like -- (i.e.: --triton for NVIDIA triton inference server)

As per https://github.com/kf5i/k3ai/issues/3#issuecomment-706486571 we should also evaluate different approaches to how we manage the flags. By now we probably will have to stick to a more simple way of manage plugins

The list should be IMO:

  • Tensorflow Serving - ResNet is merged in #3
  • Tensorflow Serving - Mnist is in the works
  • NVIDIA Triton CPU and GPU is in the works too
  • Seldon
  • BentoML
  • KFServing
  • TFX

Feel free to add others to the list.

KFServing support

Is your feature request related to a problem? Please describe.
Support deploy KFServing while deploying pipeline. Also support only deploy KFServing (with new istio installation or using the existing one).

Describe the solution you'd like
AI Scientist may train their model on other platform, but to free their hands on serving environment deployment, it would be great if we can have it be deployed automatically.

v2 plugin spec should have a constant name

Is your feature request related to a problem? Please describe.
The v2 spec uses plugin definitions of the form v2/argo/argo.yaml. The plugin spec file name should be independent of the plugin name. This would make tooling more convention driven and handling of different naming formats of plugin names would be easier.

I am proposing something like v2/argo/k3ai_plugin.yaml but something as simple as v2/argo/plugin.yaml should suffice.

KubeFlow Full Support

We have to see if there's an easy way to support the full Kubeflow as a plugin to offer to the community.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.