uc-cdis / gen3-helm Goto Github PK

View Code? Open in Web Editor NEW

8.0 12.0 19.0 3.28 MB

Helm charts for Gen3 Deployments

License: Apache License 2.0

Smarty 60.88% Python 31.68% CSS 0.05% HCL 0.05% Shell 1.26% JavaScript 6.08%

gen3-helm's Introduction

gen3-helm

Helm charts for deploying Gen3 on any kubernetes cluster.

Deploying gen3 with helm

TL;DR

helm repo add gen3 https://helm.gen3.org
helm repo update
helm upgrade --install gen3 gen3/gen3 -f ./values.yaml

Assuming you already have the prerequisites installed and configured, you can deploy Gen3 with the helm command.

Warning The default Helm chart configuration is not intended for production. The default chart creates a proof of concept (PoC) implementation where all Gen3 services are deployed in the cluster, including postgres and elasticsearch. For production deployments, you must follow the Production/Cloud Native/Hybrid architecture

For a production deployment, you should have strong working knowledge of Kubernetes. This method of deployment has different management, observability, and concepts than traditional deployments.

In a production deployment:

The stateful components, like PostgreSQL or Elasticsearch, must run outside the cluster on PaaS or compute instances. This configuration is required to scale and reliably service the variety of workloads found in production Gen3 environments.
You should use Cloud PaaS for PostgreSQL, Elasticsearch, and object storage.

Configuration

For a full set of configuration options see the CONFIGURATION.md for a more in depth instructions on how to configure each service.

There's also an auto-generated table of basic configuration options here:

README.md for gen3 chart (auto-generated documentation) or

To see documentation around setting up gen3 developer environments see gen3_developer_environments.md

Use the following as a template for your values.yaml file for a minimum deployment of gen3 using these helm charts.

global:
  hostname: example-commons.com

fence: 
  FENCE_CONFIG:
    # Any fence-config overrides here.

Selective deployments

All gen3 services are sub-charts of the gen3 chart (which acts as an umbrella chart).

For your specific installation of gen3, you may not require all our services.

To enable or disable a service you can use this pattern in your values.yaml

fence:
  enabled: true

wts:
  enabled: false

Gen3 Login Options

Gen3 does not have any IDP, but can integrate with many. We will cover Google login here, but refer to the fence documentation for additional options.

TL/DR: At minimum to have google logins working you need to set these settings in your values.yaml file

fence: 
  FENCE_CONFIG:
    OPENID_CONNECT:
      google:
        client_id: "insert.google.client_id.here"
        client_secret: "insert.google.client_secret.here"

Google login generation

You need to set up a google credential for google login as that's the default enabled option in fence.

The following steps explain how to create credentials for your gen3

Go to the Credentials page.

Click Create credentials > OAuth client ID.

Select the Web application application type. Name your OAuth 2.0 client and click Create.

For Authorized Javascript Origins add https://<hostname>

For "Authorized redirect URIs" add https://<hostname>/user/login/google/login/

After configuration is complete, take note of the client ID that was created. You will need the client ID and client secret to complete the next steps.

Production deployments

Please read this for more details on production deployments.

NOTE: Gen3 helm charts are currently not used in production by CTDS, but we are aiming to do that soon and will have additional documentation on that.

Local Development

For local development you must be connected to a kubernetes cluster. As referenced above in the section Kubernetes cluster we recommend using Rancher Desktop as Kubernetes on your local machine, especially on M1 Mac's. You also get ingress and other benefits out of the box.

For MacOS users, Minikube equipped with the ingress addon serves as a viable alternative to Rancher Desktop. On Linux, we've observed that using Kind with an NGINX ingress installed often provides a more seamless experience compared to both Rancher Desktop and Minikube. Essentially, Helm requires access to a Kubernetes cluster with ingress capabilities, facilitating the loading of the portal in your browser for an optimal development workflow.

To install the NGINX ingress:

  helm repo add nginx-stable https://helm.nginx.com/stable
  helm repo update
  kubectl create ns nginx-ingress
  helm install nginx-ingress nginx-stable/nginx-ingress --namespace nginx-ingress

Warning If you are using Rancher Desktop you need to increase the vm.max_map_count as outlined here If you are using Minikube you will need to enabled the ingress addon as outlined here

Clone the repository
Navigate to the gen3-helm/helm/gen3 directory and run helm dependency update
Navigate to the back to the gen3-helm directory and create your values.yaml file. See the TL;DR section for a minimal example.
Run helm upgrade --install gen3 ./helm/gen3 -f ./values.yaml

Using Skaffold

Skaffold is a tool for local development that can be used to automatically rebuild and redeploy your application when changes are detected. A minimal skaffold.yaml configuration file has been provided in the gen3-helm directory. Update the values of this file to match your needs.

Follow the steps above, but instead of doing the helm upgrade --install step, use skaffold dev to start the development process. Skaffold will automatically build and deploy your application to your kubernetes cluster.

Troubleshooting

Sanity checks

If deploying from the local repo, make sure you followed the steps for helm dependency update. If you make any changes, this must be repeated for those changes to propagate.

Debugging helm chart issues

Sometimes there are cryptic errors that occur during use of the helm chart, such as duplicate env vars or other items. Try rendering the resources to a file, in debug mode, and it will help determine where the issues may be taking place

helm template --debug gen3 ./helm/gen3 -f ./values.yaml > test.yaml

gen3-helm's People

Contributors

Stargazers

Watchers

Forkers

renepollard krumio bochita elwazi iblislin michaelfitzo aced-idp craigrbarnes mshadbolt chinchien-lin niehs andrzejgrzelak chicagopcdc amosbunde australianbiocommons su-informatics-lab tshuklanygc thomas-tsai occ-data

gen3-helm's Issues

[Error] Unable to deploy after several revisions - Kubernetes secret can hold 1048576 bytes at most

Description

After several revisions have been deployed, I will eventually hit a wall that prevents further deployments until I tear down the chart and reinstall it. I can potentially placate this by providing a --history-max value to the upgrade command.

Potential Causes/Targets

I think this could be affected by how large configurations are supplied to the deployments (ie. fence.USER_YAML and portal.gitops.json). These large configurations are currently supplied directly to the values interface whereas they could potentially be handled as a separate file that is mounted into a deployment's container instead.
I also read that this error is oftentimes resolved by ensuring appropriate configurations of the chart's .helmignore file(s).

Error Msg

Error: UPGRADE FAILED: create: failed to create: Secret "sh.helm.release.v1.gen3-development.v35" is invalid: data: Too long: must have at most 1048576 bytes

Duplicate keys PYTHONPATH in fence/templates/useryaml-job.yaml

There are duplicate keys of PYTHONPATH in /fence/templates/useryaml-job.yaml

        env:
          {{- toYaml .Values.env | nindent 10 }}
          - name: PYTHONPATH
            value: /var/www/fence

This is due to the rendering of the array of values in values.yaml where

...
  - name: DB
    value: postgresql://$(PGUSER):$(PGPASSWORD)@$(PGHOST):5432/$(PGDB)
  - name: PYTHONPATH
    value: /var/www/fence
  - name: FENCE_PUBLIC_CONFIG
    valueFrom:
      configMapKeyRef:
        name: manifest-fence
...

causes a duplicate and renders to

apiVersion: batch/v1
kind: Job
metadata:
  name: useryaml
spec:
  backoffLimit: 10
  template:
    metadata:
      labels:
        app: gen3job
    spec:
      automountServiceAccountToken: false
      volumes:
        - name: fence-config
          secret:
            secretName: "fence-config"
        - name: useryaml
          configMap:
            name: useryaml
      containers:
      - name: fence
        image: "quay.io/cdis/fence:master"
        imagePullPolicy: Always
        env:
          - name: DD_ENABLED
            valueFrom:
              configMapKeyRef:
                key: dd_enabled
                name: manifest-global
                optional: true
          - name: DD_ENV
            valueFrom:
              fieldRef:
                fieldPath: metadata.labels['tags.datadoghq.com/env']
          - name: DD_SERVICE
            valueFrom:
              fieldRef:
                fieldPath: metadata.labels['tags.datadoghq.com/service']
          - name: DD_VERSION
            valueFrom:
              fieldRef:
                fieldPath: metadata.labels['tags.datadoghq.com/version']
          - name: DD_LOGS_INJECTION
            value: "true"
          - name: DD_PROFILING_ENABLED
            value: "true"
          - name: DD_TRACE_SAMPLE_RATE
            value: "1"
          - name: GEN3_UWSGI_TIMEOUT
            valueFrom:
              configMapKeyRef:
                key: uwsgi-timeout
                name: manifest-global
                optional: true
          - name: DD_AGENT_HOST
            valueFrom:
              fieldRef:
                fieldPath: status.hostIP
          - name: AWS_STS_REGIONAL_ENDPOINTS
            value: regional
          - name: PYTHONPATH
            value: /var/www/fence
          - name: GEN3_DEBUG
            value: "False"
          - name: FENCE_PUBLIC_CONFIG
            valueFrom:
              configMapKeyRef:
                key: fence-config-public.yaml
                name: manifest-fence
                optional: true
          - name: PGHOST
            valueFrom:
              secretKeyRef:
                key: host
                name: fence-dbcreds
                optional: false
          - name: PGUSER
            valueFrom:
              secretKeyRef:
                key: username
                name: fence-dbcreds
                optional: false
          - name: PGPASSWORD
            valueFrom:
              secretKeyRef:
                key: password
                name: fence-dbcreds
                optional: false
          - name: PGDB
            valueFrom:
              secretKeyRef:
                key: database
                name: fence-dbcreds
                optional: false
          - name: DB
            value: postgresql://$(PGUSER):$(PGPASSWORD)@$(PGHOST):5432/$(PGDB)
          - name: PYTHONPATH
            value: /var/www/fence
        volumeMounts:
          - name: "fence-config"
            readOnly: true
            mountPath: "/var/www/fence/fence-config.yaml"
            subPath: fence-config.yaml
          - name: "useryaml"
            mountPath: "/var/www/fence/user.yaml"
            subPath: useryaml
        command: ["/bin/bash" ]
        args:
          - "-c"
          # Script always succeeds if it runs (echo exits with 0)
          - |
            fence-create sync --arborist http://arborist-service --yaml /var/www/fence/user.yaml
      restartPolicy: OnFailure

Duplicate env section in fence presigned-url-fence.yaml

There is a duplicate env section in presigned-url-fence.yaml

          env:
            {{- toYaml .Values.env | nindent 12 }}
          volumeMounts:
            {{- toYaml .Values.volumeMounts | nindent 12 }}
          env:
            {{ toYaml .Values.initEnv | nindent 12 }}

which renders to the following (pointing out the extra section after the volume mounts and the duplicate secret references)

containers:
        - name: presigned-url-fence
          image: "quay.io/cdis/fence:master"
          imagePullPolicy: Always
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
            - name: https
              containerPort: 443
              protocol: TCP
            - name: container
              containerPort: 6567
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /_status
              port: http
            initialDelaySeconds: 30
            periodSeconds: 60
            timeoutSeconds: 30
          readinessProbe:
            httpGet:
              path: /_status
              port: http
          resources:
            limits:
              cpu: 1
              memory: 2400Mi
            requests:
              cpu: 100m
              memory: 128Mi
          command: ["/bin/bash"]
          args:
            - "-c"
            - |
              echo "${FENCE_PUBLIC_CONFIG:-""}" > "/var/www/fence/fence-config-public.yaml"
              python /var/www/fence/yaml_merge.py /var/www/fence/fence-config-public.yaml /var/www/fence/fence-config-secret.yaml > /var/www/fence/fence-config.yaml
              if [[ -f /fence/keys/key/jwt_private_key.pem ]]; then
                openssl rsa -in /fence/keys/key/jwt_private_key.pem -pubout > /fence/keys/key/jwt_public_key.pem
              fi
              bash /fence/dockerrun.bash && if [[ -f /dockerrun.sh ]]; then bash /dockerrun.sh; fi
          env:
            - name: DD_ENABLED
              valueFrom:
                configMapKeyRef:
                  key: dd_enabled
                  name: manifest-global
                  optional: true
            - name: DD_ENV
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['tags.datadoghq.com/env']
            - name: DD_SERVICE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['tags.datadoghq.com/service']
            - name: DD_VERSION
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['tags.datadoghq.com/version']
            - name: DD_LOGS_INJECTION
              value: "true"
            - name: DD_PROFILING_ENABLED
              value: "true"
            - name: DD_TRACE_SAMPLE_RATE
              value: "1"
            - name: GEN3_UWSGI_TIMEOUT
              valueFrom:
                configMapKeyRef:
                  key: uwsgi-timeout
                  name: manifest-global
                  optional: true
            - name: DD_AGENT_HOST
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
            - name: AWS_STS_REGIONAL_ENDPOINTS
              value: regional
            - name: PYTHONPATH
              value: /var/www/fence
            - name: GEN3_DEBUG
              value: "False"
            - name: FENCE_PUBLIC_CONFIG
              valueFrom:
                configMapKeyRef:
                  key: fence-config-public.yaml
                  name: manifest-fence
                  optional: true
            - name: PGHOST
              valueFrom:
                secretKeyRef:
                  key: host
                  name: fence-dbcreds
                  optional: false
            - name: PGUSER
              valueFrom:
                secretKeyRef:
                  key: username
                  name: fence-dbcreds
                  optional: false
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  key: password
                  name: fence-dbcreds
                  optional: false
            - name: PGDB
              valueFrom:
                secretKeyRef:
                  key: database
                  name: fence-dbcreds
                  optional: false
            - name: DB
              value: postgresql://$(PGUSER):$(PGPASSWORD)@$(PGHOST):5432/$(PGDB)
          volumeMounts:
            - mountPath: /var/www/fence/local_settings.py
              name: old-config-volume
              readOnly: true
              subPath: local_settings.py
            - mountPath: /var/www/fence/fence_credentials.json
              name: json-secret-volume
              readOnly: true
              subPath: fence_credentials.json
            - mountPath: /var/www/fence/creds.json
              name: creds-volume
              readOnly: true
              subPath: creds.json
            - mountPath: /var/www/fence/config_helper.py
              name: config-helper
              readOnly: true
              subPath: config_helper.py
            - mountPath: /fence/fence/static/img/logo.svg
              name: logo-volume
              readOnly: true
              subPath: logo.svg
            - mountPath: /fence/fence/static/privacy_policy.md
              name: privacy-policy
              readOnly: true
              subPath: privacy_policy.md
            - mountPath: /var/www/fence/fence-config.yaml
              name: config-volume
              readOnly: true
              subPath: fence-config.yaml
            - mountPath: /var/www/fence/yaml_merge.py
              name: yaml-merge
              readOnly: true
              subPath: yaml_merge.py
            - mountPath: /var/www/fence/fence_google_app_creds_secret.json
              name: fence-google-app-creds-secret-volume
              readOnly: true
              subPath: fence_google_app_creds_secret.json
            - mountPath: /var/www/fence/fence_google_storage_creds_secret.json
              name: fence-google-storage-creds-secret-volume
              readOnly: true
              subPath: fence_google_storage_creds_secret.json
            - mountPath: /fence/keys/key/jwt_private_key.pem
              name: fence-jwt-keys
              readOnly: true
              subPath: jwt_private_key.pem
          env:
            
            - name: PGHOST
              valueFrom:
                secretKeyRef:
                  key: host
                  name: fence-dbcreds
                  optional: false
            - name: PGUSER
              valueFrom:
                secretKeyRef:
                  key: username
                  name: fence-dbcreds
                  optional: false
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  key: password
                  name: fence-dbcreds
                  optional: false
            - name: PGDB
              valueFrom:
                secretKeyRef:
                  key: database
                  name: fence-dbcreds
                  optional: false
            - name: DB
              value: postgresql://$(PGUSER):$(PGPASSWORD)@$(PGHOST):5432/$(PGDB)
            - name: PYTHONPATH
              value: /var/www/fence
            - name: FENCE_PUBLIC_CONFIG
              valueFrom:
                configMapKeyRef:
                  key: fence-config-public.yaml
                  name: manifest-fence
                  optional: true

Possible misconfigured in nginx.conf leading to null `user_id` value

Hello! Thanks for all the work put into this project, excited to use it in our deployments.

@bwalsh, @matthewpeterkort, @jawadqur

Issue Encountered

Encountering some 401/403 errors related to authentication.

Logs and Info

{"gen3log": "nginx", "date_access": "2023-03-03T00:32:29+00:00", "user_id": "-"

Possible Fix

  js_import helpers.js;
  js_set $userid userid; # <- Should be `helpers.userid` like the credentials line further down

  ...

  js_set $credentials_allowed helpers.isCredentialsAllowed;

After applying the above edit the user_id filed in the nginx logs was the expected value and the 401/403 errors were reduced:

{"gen3log": "nginx", "date_access": "2023-03-03T00:32:29+00:00", "user_id": "[email protected]"

Reference

https://nginx.org/en/docs/http/ngx_http_js_module.html

Duplicate key terminationGracePeriodSeconds in wts deployment

terminationGracePeriodSeconds is duplicated in wts/templates/deployment.yaml

snippet:

      terminationGracePeriodSeconds: 10
      volumes:
        - name: wts-secret
          secret:
            secretName: "wts-g3auto"
      terminationGracePeriodSeconds: 10

nginx unable to retrieve schema.json

Hi All! Thanks for all the effort put into this helm project! Using your instructions we were able to get a Gen3 local deployment via Helm up and running with custom TLS certs.

@bwalsh, @matthewpeterkort, @jawadqur

Issue Encountered

We are running into a small bug regarding schema.json, specifically with retrieving it during run time from the data portal.

Logs and Info

Judging by the rexproxy logs nginx is unable to find schema.json due to it not being in the expected directory within the portal deployment:

2023/03/02 22:08:42 [error] 18#18: *29 open() "/usr/share/nginx/html/data/schema.json" failed (2: No such file or directory), client: 10.42.0.125, server: , request:
"GET //data/schema.json HTTP/1.0", host: "aced-training.compbio.ohsu.edu", referrer: "https://aced-training.compbio.ohsu.edu/_root"
10.42.0.125 - - [02/Mar/2023:22:08:42 +0000] "GET //data/schema.json HTTP/1.0" 404 153 "https://aced-training.compbio.ohsu.edu/_root" "Mozilla/5.0 (Macintosh; Intel M
ac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/110.0" "10.42.0.34"

Successful fetching of the data schema upon building the data portal:

INFO: Running schema and relay for non-workspace bundle

> [email protected] schema
> node ./data/getSchema

Fetching http://revproxy-service/api/v0/submission/getschema
Fetching http://revproxy-service/api/v0/submission/_dictionary/_all
All done!

schema.json is successfully written to the /data-portal/data directory:

root@portal-deployment-7dcd4c8446-8gk7d:/data-portal/data# ls
config           dictionaryHelper.js       getSchema.js  gqlHelper.js.njk  parameters.js   schema.json
dictionary.json  dictionaryHelper.test.js  getTexts.js   gqlSetup.js       schema.graphql  utils.js

root@portal-deployment-7dcd4c8446-8gk7d:/data-portal/data# ls -alt schema.json
-rw-r--r-- 1 root root 5624461 Mar  2 22:11 schema.json

schema.json is not written or copied to the the nginx directory where it can then be served in the data portal:

ls -alt /usr/share/nginx/html/data/schema.json
ls: cannot access '/usr/share/nginx/html/data/schema.json': No such file or directory

Current Fix

Manually copying the file from /data/schema.json to the the /usr/share/nginx/html/data/ directory solves the issue.

Read me instrucion on helm deployment are not detailed.

I can see the gen3-helm Read.Me has not been updated as last when Jawad Qureshi gave the talk about ''Helm Charts for Desktop''.

See value.yaml

Configure securityContext for containers

In order to improve the security of the containers, as described in the references below, request that consideration be given to configuring the securityContext to specify a user other than root. This would help compartmentalize the impact if a container were compromised by mitigating potential container escape vulnerabilities.

Duplicate keys in sheepdog deployment yaml

There is a duplicate env var key in helm/sheepdog/templates/deployment.yaml which is causing template rendering issues

error:

W0104 17:44:04.431345    8988 warnings.go:70] spec.template.spec.containers[0].env[15].name: duplicate name "FENCE_URL"

snippet:

            - name: FENCE_URL
              valueFrom:
                configMapKeyRef:
                  name: manifest-global
                  key: fence_url
                  optional: true
            - name: INDEXD_PASS
              valueFrom:
                secretKeyRef:
                  name: indexd-service-creds
                  key: gdcapi
                  optional: false
            - name: GEN3_UWSGI_TIMEOUT
              value: "600"
            - name: DICTIONARY_URL
              value: {{ include "sheepdog.dictionaryUrl" .}}
            {{- with .Values.indexdUrl }}
            - name: INDEX_CLIENT_HOST
              value: {{ . }}
            {{- end }}
            {{- with .Values.fenceUrl }}
            - name: FENCE_URL
              value: {{ . }}
            {{- end }}

unable to upload certain files from the UI

I am using the default DD and simulated data using the data simulator.
experiment metadata file upload error
"403 Client Error: No username / password configured in indexd for url: http://indexd-service/index/",

uc-cdis / gen3-helm Goto Github PK

gen3-helm's Introduction

gen3-helm

Deploying gen3 with helm

TL;DR

Configuration

Selective deployments

Gen3 Login Options

Google login generation

Production deployments

Local Development

Using Skaffold

Troubleshooting

Sanity checks

Debugging helm chart issues

gen3-helm's People

Contributors

Stargazers

Watchers

Forkers

gen3-helm's Issues

Description

Potential Causes/Targets

Error Msg

Issue Encountered

Logs and Info

Possible Fix

Reference

Issue Encountered

Logs and Info

Current Fix

Recommend Projects

Recommend Topics

Recommend Org

Jobs