GithubHelp home page GithubHelp logo

Comments (14)

issue-label-bot avatar issue-label-bot commented on June 26, 2024

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/feature 0.59

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

from kubeflow-distribution.

jlewi avatar jlewi commented on June 26, 2024

The 1.1 branch of manifests has been cut.

We still need to cut a 1.1 branch of blueprints and point it at the 1.1. branch of manifests.

However, the tests for the blueprints are still failing
https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-gcp-blueprints-master-periodic

We should probably fix that before cutting a 1.1 branch and pinning things

from kubeflow-distribution.

jlewi avatar jlewi commented on June 26, 2024

Update:

from kubeflow-distribution.

issue-label-bot avatar issue-label-bot commented on June 26, 2024

Issue-Label Bot is automatically applying the labels:

Label Probability
area/docs 0.99

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

from kubeflow-distribution.

jlewi avatar jlewi commented on June 26, 2024

Branch is cut.

from kubeflow-distribution.

jlewi avatar jlewi commented on June 26, 2024

Status qualification of master branch (testing against kubeflow/manifests @ master)

Status qualification of 1.1 branch
Auto deployment on 1.1. branch is unhealthy looks like 500's

  • Its only been 27 minutes since it deployed so might need to wait longer.

from kubeflow-distribution.

jlewi avatar jlewi commented on June 26, 2024

Auto deployment on 1.1 branch is failing GCLB health checks because backend-updater couldn't update healthchecks

++ kubectl -n istio-system get service istio-ingressgateway -o 'jsonpath={.spec.ports[?(@.name=="status-port")].nodePort}'
+ STATUS_NODE_PORT=32339
+ gcloud --project=kubeflow-ci-deployment compute health-checks update http https://www.googleapis.com/compute/v1/projects/kubeflow-ci-deployment/global/healthChecks/k8s-be-30622--1a5607511b8984c7 --port=32339
ERROR: (gcloud.compute.health-checks.update.http) Could not fetch resource:
 - Required 'compute.healthChecks.update' permission for 'projects/kubeflow-ci-deployment/global/healthChecks/k8s-be-30622--1a5607511b8984c7'

+ echo 'Backend updated successfully. Waiting 1 hour before updating again.'
+ sleep 3600
Backend updated successfully. Waiting 1 hour before updating again.

from kubeflow-distribution.

jlewi avatar jlewi commented on June 26, 2024

backend-updater is using kf-admin SA and that service account has GCP IAM annotation for admin GSA

https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-gcp-blueprints-periodic-1-1 shows the workload identity test is failing
Looks like a bug in the test

>           raise HttpError(resp, content, uri=self.uri)
E           googleapiclient.errors.HttpError: <HttpError 404 when requesting https://iam.googleapis.com/v1/projects/kubeflow-ci-deployment/serviceAccounts/[email protected]:getIamPolicy?alt=json returned "Service account projects/kubeflow-ci-deployment/serviceAccounts/[email protected] does not exist.">

It looks like the IAMPolicyMember bindings might be missing

kubectl --context=kubeflow-ci-management get gcp -l kf-name=kf-v1-1-0710-084
NAME                                                               AGE
computeaddress.compute.cnrm.cloud.google.com/kf-v1-1-0710-084-ip   50m

NAME                                                                                AGE
computedisk.compute.cnrm.cloud.google.com/kf-v1-1-0710-084-storage-artifact-store   50m
computedisk.compute.cnrm.cloud.google.com/kf-v1-1-0710-084-storage-metadata-store   50m

NAME                                                                AGE
containercluster.container.cnrm.cloud.google.com/kf-v1-1-0710-084   50m

NAME                                                                 AGE
iamserviceaccount.iam.cnrm.cloud.google.com/kf-v1-1-0710-084-admin   50m
iamserviceaccount.iam.cnrm.cloud.google.com/kf-v1-1-0710-084-user    50m
iamserviceaccount.iam.cnrm.cloud.google.com/kf-v1-1-0710-084-vm      50m

from kubeflow-distribution.

jlewi avatar jlewi commented on June 26, 2024

No that's wrong; looks like IAMPolicyMember might not show up in either because its not in the group or because its missing labels

Here are the bindings:

kf-v1-1-0710-084-admin-bigquery                                                52m
kf-v1-1-0710-084-admin-cloudbuild                                              52m
kf-v1-1-0710-084-admin-cloudsql                                                52m
kf-v1-1-0710-084-admin-dataflow                                                52m
kf-v1-1-0710-084-admin-dataproc                                                52m
kf-v1-1-0710-084-admin-istio-wi                                                52m
kf-v1-1-0710-084-admin-logging                                                 52m
kf-v1-1-0710-084-admin-manages-user                                            52m
kf-v1-1-0710-084-admin-metricwriter                                            52m
kf-v1-1-0710-084-admin-ml                                                      52m
kf-v1-1-0710-084-admin-monitoringviewer                                        52m
kf-v1-1-0710-084-admin-network                                                 52m
kf-v1-1-0710-084-admin-servicemanagement                                       52m
kf-v1-1-0710-084-admin-source                                                  52m
kf-v1-1-0710-084-admin-storage                                                 52m
kf-v1-1-0710-084-admin-viewer                                                  52m
kf-v1-1-0710-084-admin-wi                                                      52m
kf-v1-1-0710-084-admin-workload-identity-user                                  52m

from kubeflow-distribution.

jlewi avatar jlewi commented on June 26, 2024

IAMPolicy has error because SA doesn't exist

 message: 'error setting policy member: error applying changes: Batch "iam-project-projects/kubeflow-ci-deployment
      modifyIamPolicy" for request "Create IAM Members roles/compute.networkAdmin
      serviceAccount:kf-v1-1-0710-084-admin@kubeflow-ci-deployment.iam.gserviceaccount.com
      for \"project \\\"projects/kubeflow-ci-deployment\\\"\"" returned error: Error
      applying IAM policy for project "projects/kubeflow-ci-deployment": Error setting
      IAM policy for project "projects/kubeflow-ci-deployment": googleapi: Error 400:
      Service account kf-v1-1-0710-1c5-admin@kubeflow-ci-deployment.iam.gserviceaccount.com
      does not exist., badRequest'

The error is complaining about unrelated missing service accounts. I suspect they are from other deployments. Since IAM Policy needs to be updated in its entirety CNRM probably gathers all the policymember bindings and so other broken bindings are blocking the successful apply.

from kubeflow-distribution.

jlewi avatar jlewi commented on June 26, 2024

It looks like the policy just got applied and iampolicymembers/kf-v1-1-0710-084-admin-network is now up todate (about 55 minutes after creation).

from kubeflow-distribution.

jlewi avatar jlewi commented on June 26, 2024

I had to kick the backend-updater pod to make it update the health check

from kubeflow-distribution.

jlewi avatar jlewi commented on June 26, 2024

v1-1 cluster came up
All but 2 tests are passing
https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-gcp-blueprints-periodic-1-1

  • WI test
  • knative test

from kubeflow-distribution.

Bobgy avatar Bobgy commented on June 26, 2024

1.1 already released

from kubeflow-distribution.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.