reporting's Introduction

Kubeflow Usage Reporting

This repository is devoted to collection and analysis of Kubeflow usage.

Deploying Spartakus

We use spartakus to report and collect metrics.

The collector is a web service that runs on GKE backed by BigQuery.

Below are instructions describing the process we followed to setup Kubeflow's production instance of the collector.

Set relevant environment variables

. prod_env.sh

Create a cluster

gcloud container clusters create --project=kubeflow-usage reporting --zone=us-central1-c --machine-type=n1-standard-4

Create a static IP address to serve on

gcloud gcloud compute --project=${PROJECT} addresses create ${HOSTNAME} --global

Configure an A record for the kubeflow.org domain to map $FQDN to the static IP address obtained in the previous step.

Create the namespace for the collector.

kubectl create namespace kube-lego
kubectl create namespace ${NAMESPACE}

Create the BigQuery dataset and table

bq mk --project_id=${PROJECT} ${DATASET}
bq mk --project_id=${PROJECT} -t ${DATASET}.${TABLE} ~/git_spartakus/pkg/database/bigquery.schema.json

In GCP create a service account and download the private key

gcloud --project=${PROJECT} iam service-accounts create collector  --display-name="Spartakus collector."
gcloud projects add-iam-policy-binding ${PROJECT} --member=serviceAccount:${SERVICE_ACCOUNT} --role=roles/bigquery.dataEditor
gcloud --project=${PROJECT} iam service-accounts keys create \
    ${HOME}/secrets/key.${SERVICE_ACCOUNT}.json \
    --iam-account ${SERVICE_ACCOUNT}

kubectl -n ${NAMESPACE} create secret generic ${BIGQUERY_SECRET}  --from-file=bigquery.json=${HOME}/secrets/key.${SERVICE_ACCOUNT}.json

Deploy the collector

ks param set collector fqdn ${FQDN}
ks param set collector ipName ${HOSTNAME}
ks param set collector namespace ${NAMESPACE}
ks param set collector project ${PROJECT}
ks param set collector dataset ${DATASET}
ks param set collector table ${TABLE}

ks apply prod -c collector

Recommend Projects