Comments (14)
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
kind/feature | 0.59 |
Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.
from kubeflow-distribution.
The 1.1 branch of manifests has been cut.
We still need to cut a 1.1 branch of blueprints and point it at the 1.1. branch of manifests.
However, the tests for the blueprints are still failing
https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-gcp-blueprints-master-periodic
We should probably fix that before cutting a 1.1 branch and pinning things
from kubeflow-distribution.
Update:
-
Doc updates have started
- kubeflow/website#2023 - initial docs for blueprints; marked some other pages as out of date
-
Still working on getting a passing green https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-gcp-blueprints-master-periodic
- A number of PRs are out to fix things
-
Still need to setup a 1.1 branch and tests as well.
from kubeflow-distribution.
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
area/docs | 0.99 |
Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.
from kubeflow-distribution.
Branch is cut.
from kubeflow-distribution.
Status qualification of master branch (testing against kubeflow/manifests @ master)
- test grid https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-gcp-blueprints-periodic-master
- mnist is passing
- There are PRs out to fix the tests
- We should add xgboost #78
- master branch should be running on GKE regular (kubernetes 1.16)
Status qualification of 1.1 branch
Auto deployment on 1.1. branch is unhealthy looks like 500's
- Its only been 27 minutes since it deployed so might need to wait longer.
from kubeflow-distribution.
Auto deployment on 1.1 branch is failing GCLB health checks because backend-updater couldn't update healthchecks
++ kubectl -n istio-system get service istio-ingressgateway -o 'jsonpath={.spec.ports[?(@.name=="status-port")].nodePort}'
+ STATUS_NODE_PORT=32339
+ gcloud --project=kubeflow-ci-deployment compute health-checks update http https://www.googleapis.com/compute/v1/projects/kubeflow-ci-deployment/global/healthChecks/k8s-be-30622--1a5607511b8984c7 --port=32339
ERROR: (gcloud.compute.health-checks.update.http) Could not fetch resource:
- Required 'compute.healthChecks.update' permission for 'projects/kubeflow-ci-deployment/global/healthChecks/k8s-be-30622--1a5607511b8984c7'
+ echo 'Backend updated successfully. Waiting 1 hour before updating again.'
+ sleep 3600
Backend updated successfully. Waiting 1 hour before updating again.
from kubeflow-distribution.
backend-updater is using kf-admin SA and that service account has GCP IAM annotation for admin GSA
https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-gcp-blueprints-periodic-1-1 shows the workload identity test is failing
Looks like a bug in the test
> raise HttpError(resp, content, uri=self.uri)
E googleapiclient.errors.HttpError: <HttpError 404 when requesting https://iam.googleapis.com/v1/projects/kubeflow-ci-deployment/serviceAccounts/[email protected]:getIamPolicy?alt=json returned "Service account projects/kubeflow-ci-deployment/serviceAccounts/[email protected] does not exist.">
It looks like the IAMPolicyMember bindings might be missing
kubectl --context=kubeflow-ci-management get gcp -l kf-name=kf-v1-1-0710-084
NAME AGE
computeaddress.compute.cnrm.cloud.google.com/kf-v1-1-0710-084-ip 50m
NAME AGE
computedisk.compute.cnrm.cloud.google.com/kf-v1-1-0710-084-storage-artifact-store 50m
computedisk.compute.cnrm.cloud.google.com/kf-v1-1-0710-084-storage-metadata-store 50m
NAME AGE
containercluster.container.cnrm.cloud.google.com/kf-v1-1-0710-084 50m
NAME AGE
iamserviceaccount.iam.cnrm.cloud.google.com/kf-v1-1-0710-084-admin 50m
iamserviceaccount.iam.cnrm.cloud.google.com/kf-v1-1-0710-084-user 50m
iamserviceaccount.iam.cnrm.cloud.google.com/kf-v1-1-0710-084-vm 50m
from kubeflow-distribution.
No that's wrong; looks like IAMPolicyMember might not show up in either because its not in the group or because its missing labels
Here are the bindings:
kf-v1-1-0710-084-admin-bigquery 52m
kf-v1-1-0710-084-admin-cloudbuild 52m
kf-v1-1-0710-084-admin-cloudsql 52m
kf-v1-1-0710-084-admin-dataflow 52m
kf-v1-1-0710-084-admin-dataproc 52m
kf-v1-1-0710-084-admin-istio-wi 52m
kf-v1-1-0710-084-admin-logging 52m
kf-v1-1-0710-084-admin-manages-user 52m
kf-v1-1-0710-084-admin-metricwriter 52m
kf-v1-1-0710-084-admin-ml 52m
kf-v1-1-0710-084-admin-monitoringviewer 52m
kf-v1-1-0710-084-admin-network 52m
kf-v1-1-0710-084-admin-servicemanagement 52m
kf-v1-1-0710-084-admin-source 52m
kf-v1-1-0710-084-admin-storage 52m
kf-v1-1-0710-084-admin-viewer 52m
kf-v1-1-0710-084-admin-wi 52m
kf-v1-1-0710-084-admin-workload-identity-user 52m
from kubeflow-distribution.
IAMPolicy has error because SA doesn't exist
message: 'error setting policy member: error applying changes: Batch "iam-project-projects/kubeflow-ci-deployment
modifyIamPolicy" for request "Create IAM Members roles/compute.networkAdmin
serviceAccount:kf-v1-1-0710-084-admin@kubeflow-ci-deployment.iam.gserviceaccount.com
for \"project \\\"projects/kubeflow-ci-deployment\\\"\"" returned error: Error
applying IAM policy for project "projects/kubeflow-ci-deployment": Error setting
IAM policy for project "projects/kubeflow-ci-deployment": googleapi: Error 400:
Service account kf-v1-1-0710-1c5-admin@kubeflow-ci-deployment.iam.gserviceaccount.com
does not exist., badRequest'
The error is complaining about unrelated missing service accounts. I suspect they are from other deployments. Since IAM Policy needs to be updated in its entirety CNRM probably gathers all the policymember bindings and so other broken bindings are blocking the successful apply.
from kubeflow-distribution.
It looks like the policy just got applied and iampolicymembers/kf-v1-1-0710-084-admin-network
is now up todate (about 55 minutes after creation).
from kubeflow-distribution.
I had to kick the backend-updater pod to make it update the health check
from kubeflow-distribution.
v1-1 cluster came up
All but 2 tests are passing
https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-gcp-blueprints-periodic-1-1
- WI test
- knative test
from kubeflow-distribution.
1.1 already released
from kubeflow-distribution.
Related Issues (20)
- make hydrate fails with "no cluster named ..." HOT 2
- GKE Default prometheus enable feature causing conflict HOT 1
- Kubeflow on Google Cloud - 1.7.1 release tracker HOT 1
- Mpijobs CustomResourceDefinition Version Error HOT 1
- make apply error HOT 9
- ApiVerisonv 2beta1 HOT 1
- Pull&Push Container Image From Artifact Registry
- kubeflow.org_profiles.yaml causes errors with creationTimestamp line HOT 1
- Deploying Kubeflow cluster `make apply` on custom machine shapes not working HOT 7
- make apply error when deploying Kubeflow cluster HOT 2
- deploying management cluster error HOT 1
- when running kubeflow pipeline it spins up different shape of machine than what is in my cluster HOT 1
- set up automated testing
- Kubeflow on Google Cloud - 1.8.0 release tracker HOT 1
- Kubeflow Node Selector HOT 4
- kustomize generates file with creationTimestamp line
- Error with kubeflow deploy HOT 1
- Error during kubeflow deployment
- Error during kubeflow deployment on GCP HOT 1
- timed out waiting for the condition on sqlinstances/kubeflow-kfp in Kubeflow Deployment HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kubeflow-distribution.