GithubHelp home page GithubHelp logo

ibm / cloud-pak-deployer Goto Github PK

View Code? Open in Web Editor NEW
131.0 12.0 65.0 264.14 MB

Configuration-based installation of OpenShift and Cloud Pak for Data/Integration/Watson AIOps on various private and public cloud infrastructure providers. Deployment attempts to achieve the end-state defined in the configuration. If something fails along the way, you only need to restart the process to continue the deployment.

Home Page: https://ibm.github.io/cloud-pak-deployer/

License: Apache License 2.0

Python 13.58% Jinja 57.93% Shell 20.23% HTML 0.23% CSS 0.05% JavaScript 7.19% SCSS 0.46% Dockerfile 0.22% Makefile 0.09%

cloud-pak-deployer's Introduction

Cloud Pak Deployer

The intention of the Cloud Pak Deployer is to simplify the initial installation and also continuous management of OpenShift and the Cloud Paks on top of that, driven by automation. It will help you deploy (currently) Cloud Pak for Data on various Red Hat OpenShift and infrastructures such as IBM Cloud ROKS, Azure Red Hat OpenShift (ARO), Red Hat OpenShift on AWS (ROSA), VMWare vSphere and also on existing Red Hat OpenShift environments.

The Cloud Pak Deployer was created for a joint project with one of our key partners who need to fully automate the deployment of Cloud Pak for Data on IBM Cloud based on a configuration that is kept in a Git repository, and make changes to it via configuration changes, i.e. GitOps.

A couple of notes from the authors:

  • If you find this repository useful, please "Star" it on GitHub to advertise it to a wider community
  • If you have questions or problems, please open an issue in the GitHub repository
  • Even better, if you find a defect and can resolve it, please fork the repository, fix it and send a pull request

DISCLAIMER The scripts in this repository are provided "as is" and have been developed with several use cases in mind, from deploying canned demonstrations to proofs of concept, test deployments and production deployments. Their main goals are automation and acceleration of initial deployment, day 2 operations and continuous adoption of Cloud Pak updates. Scripts and playbooks in the Cloud Pak Deployer do not do anything "special" you could not do manually using the official documentation or via your own automation. IBM does not and cannot support the Cloud Pak Deployer; it is the responsibility of the installer to verify that the installation of Red Hat OpenShift and the Cloud Pak meets the requirements for resilience, security and other functional and non-functional aspects. You can choose to use the Cloud Pak Deployer in mission-critical environments such as production, but it is your responsibility to support such an installation.

Thank you in advance for using this this toolkit!

cloud-pak-deployer's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cloud-pak-deployer's Issues

WKC 4.5.0 fails to install due to missing SCC

Describe the bug
SCCs and service accounts have been renamed in CP4D 4.5. Adjust SCC and polcy assignment accordingly.

#  applying SCC for Watson Knowledge Catalog add-on
cat <<EOF |oc apply -f -
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: false
allowedCapabilities: null
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
  type: RunAsAny
kind: SecurityContextConstraints
metadata:
  annotations:
    kubernetes.io/description: WKC/IIS provides all features of the restricted SCC
      but runs as user 10032.
  name: wkc-iis-scc
readOnlyRootFilesystem: false
requiredDropCapabilities:
- KILL
- MKNOD
- SETUID
- SETGID
runAsUser:
  type: MustRunAs
  uid: 10032
seLinuxContext:
  type: MustRunAs
supplementalGroups:
  type: RunAsAny
volumes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- projected
- secret

EOF

#  adding SCC to user  for Watson Knowledge Catalog add-on
oc adm policy add-scc-to-user wkc-iis-scc system:serviceaccount:zen-450:wkc-iis-sa

Container image cloud-pak-deployer does not exist on the local machine, please build first.

Describe the bug
During cp-deploy.sh vault set on an airgapped bastion, the following error occurs:
Container image cloud-pak-deployer does not exist on the local machine, please build first.

To Reproduce
Steps to reproduce the behavior:

  1. Go to dir where cp-deploy.sh is
  2. Run e.g.

./cp-deploy.sh vault set
-vs defense-oc-login
-vsv "oc login api.ocp48.tec.uk.ibm.com:6443 -u ocadmin -p passw0rd --insecure-skip-tls-verify"

  1. See error

Expected behavior
I expect an entry to be added in the vault

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Red Hat Linux
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Solved by first running

./cp-deploy.sh env apply
-e env_id=defense
--air-gapped

afterwards running the vault command again

Support private-only ROKS cluster

ROKS supports provisioning of a cluster without public API and ingress endpoints. Add a property to the openshift object for ROKS to allow setting this flag at provisioning time.

openshift:
- name: "{{ env_id }}"
  ocp_version: 4.8
  compute_flavour: bx2.16x64
  compute_nodes: 3
  infrastructure:
    type: vpc
    vpc_name: "{{ env_id }}"
    subnets: 
    - "{{ env_id }}-subnet-zone-1"
    - "{{ env_id }}-subnet-zone-2"
    - "{{ env_id }}-subnet-zone-3"
    cos_name: "{{ env_id }}-cos"
    private_only: True

Deployer failing despite --accept-all-licenses

Describe the bug
When accept_licenses is False in the configuration, the deployer fails, even if --accept-all-licenses is specified at the command line.

TASK [cp4d-cluster : Fail if licenses were not accepted] ***********************
Wednesday 17 August 2022 13:22:50 +0000 (0:00:00.049) 0:03:40.875 ******
fatal: [localhost]: FAILED! => {"changed": false, "msg": "You must accept the licenses, either in the cp4d object with accept_licenses: True, or by specifying --accept-all-licenses at the command line"}

Support for Non ROSA AWS deployments

Is your feature request related to a problem? Please describe.
Currently users can only use the deployer to deploy to ROSA on AWS

Describe the solution you'd like
I would like to see terraform being used to allow for deployment on self managed AWS infrastructure

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

ROSA-CP4i installation : Platform check ( AWS not supported)

Describe the bug
While installing cp4i using cpd-deployer, getting below error.

To Reproduce
Steps to reproduce the behavior:
All configuration used can be refered here.
https://github.com/ibm-iacs/cloud-pak-deployer/tree/ma-cp4i

Error

TASK [cp4i-prepare-openshift : Validate cloud_platform is implemented] *********
Friday 10 June 2022  14:24:24 +0000 (0:00:00.081)       1:04:51.636 *********** 
fatal: [localhost]: FAILED! => {"changed": false, "msg": "cloud_platform aws is not implemented, current implemented cloud platforms are ['ibm-cloud', 'existing-ocp', 'vsphere'] "}

PLAY RECAP *********************************************************************
localhost                  : ok=366  changed=51   unreachable=0    failed=1    skipped=201  rescued=0    ignored=0   

Friday 10 June 2022  14:24:24 +0000 (0:00:00.073)       1:04:51.710 *********** 
=============================================================================== 

A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Remove 30000- ports from ROKS security group

When provisioning a ROKS cluster, a security group is automatically added for all cluster inbound and outbound traffic. The security group rules include allow on TCP and UDP ports 30000-32767.

Include a flag deny_node_ports for ROKS clusters that will remove the SG rule for this cluster in question:

openshift:
- name: "{{ env_id }}"
  ocp_version: 4.8
  compute_flavour: bx2.16x64
  compute_nodes: 3
  infrastructure:
    type: vpc
    vpc_name: "{{ env_id }}"
    subnets: 
    - "{{ env_id }}-subnet-zone-1"
    - "{{ env_id }}-subnet-zone-2"
    - "{{ env_id }}-subnet-zone-3"
    cos_name: "{{ env_id }}-cos"
    deny_node_ports: False
  • The flag must allow only True or False. False is the default if not specified.
  • When set to True, the Allow ICMP, TCP and UDP rules for the security group associated with the ROKS cluster must be removed
  • When set to False, the Allow ICMP, TCP and UDP rules must be added
  • The adding/removing of the rules must be idempotent; when re-running the deployer and the rules are already in the correct state, nothing must happen
  • Add documentation for the flag

Use olm-utils to default case and operand versions

Retrieve case and operand versions using olm utils. Also use the packagemanifests to retrieve the default subscription channel. This to dramatically reduce configuration effort of Cloud Pak for Data.

Changes:

  • Make cp4d_version mandatory and should be x.y.z
  • Remove case_version, subscription_channel and version from sample CP4D cartridges
  • Retrieve cartridge case and operand versions using cp4d_version via olm-utils
  • Retrieve default channel from catalog source via package manifest
  • If not specified, default case_version, subscription_channel and version from versions retrieved by olm-utils
  • Update documentation

Optimize for running in pod

*Features to be added

  • If the Cloud Pak Deployer fails with an Ansible or other error, especially when running inside an OpenShift pod, the exceptions are not logged in the cloud-pak-deployer.log. cloud-pak-deployer.log should have both stdout and stderr output.
  • When run in a pod, wait for the configuration to be ready and then start the deployer. oc log should show the output of the deployer.
  • For Cloud Pak for Data, output the CR states to an output file so it can be read externally $STATUS_DIR/log/cluster-project-cr-state.log
  • Output the CSV states to an output file so it can be read externally $STATUS_DIR/log/cluster-project-csv-state.log
  • Output the catalog sources to an output file so it can be read externally $STATUS_DIR/log/cluster-project-catsrc-state.log

Add AWS EBS and EFS support

Cloud Pak for Data 4.5 introduces support for AWS EBS and EFS. Add support for this using a new storage type.

  • Add storage type: aws
  • Defaults to 2 storage classes: efs-nfs-client (file) and gp2 (block)
  • Provision dynamic storage provisioner for efs-nfs-client
  • Add integration with Amazon file storage (create file storage, attach security groups, add access point)
  • Install AWS EFS operator on OpenShift
  • Create EFS SharedVolume
  • Destroy file server when ROSA cluster is destroyed

Pause the MachineConfig operator to disable rebooting of compute nodes

Describe the bug
This is while provisioning cp4d(cp4d_version: 4.0.9) over ROSA( ocp_version: 4.8.24) using latest version of deployer.

TASK [cp-ocp-mco-pause : Fail if MachineConfigPool worker does not exist and not existing OpenShift or IBM Cloud] ***
Tuesday 28 June 2022  13:18:29 +0000 (0:00:00.074)       0:03:55.562 ********** 

TASK [cp-ocp-mco-pause : Pause the MachineConfig operator to disable rebooting of compute nodes] ***
Tuesday 28 June 2022  13:18:29 +0000 (0:00:00.067)       0:03:55.630 ********** 
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "oc patch --type=merge --patch='{\"spec\":{\"paused\":true}}' machineconfigpool/worker\n", "delta": "0:00:00.815561", "end": "2022-06-28 13:18:30.830015", "msg": "non-zero return code", "rc": 1, "start": "2022-06-28 13:18:30.014454", "stderr": "Error from server (Prevented from accessing Red Hat managed resources. This is in an effort to prevent harmful actions that may cause unintended consequences or affect the stability of the cluster. If you have any questions about this, please reach out to Red Hat support at https://access.redhat.com/support): admission webhook \"regular-user-validation.managed.openshift.io\" denied the request: Prevented from accessing Red Hat managed resources. This is in an effort to prevent harmful actions that may cause unintended consequences or affect the stability of the cluster. If you have any questions about this, please reach out to Red Hat support at https://access.redhat.com/support", "stderr_lines": ["Error from server (Prevented from accessing Red Hat managed resources. This is in an effort to prevent harmful actions that may cause unintended consequences or affect the stability of the cluster. If you have any questions about this, please reach out to Red Hat support at https://access.redhat.com/support): admission webhook \"regular-user-validation.managed.openshift.io\" denied the request: Prevented from accessing Red Hat managed resources. This is in an effort to prevent harmful actions that may cause unintended consequences or affect the stability of the cluster. If you have any questions about this, please reach out to Red Hat support at https://access.redhat.com/support"], "stdout": "", "stdout_lines": []}

Error

Error from server (Prevented from accessing Red Hat managed resources. This is in an effort to prevent harmful actions that may cause unintended consequences or affect the stability of the cluster. If you have any questions about this, please reach out to Red Hat support at https://access.redhat.com/support): admission webhook "regular-user-validation.managed.openshift.io" denied the request: Prevented from accessing Red Hat managed resources. This is in an effort to prevent harmful actions that may cause unintended consequences or affect the stability of the cluster. If you have any questions about this, please reach out to Red Hat support at https://access.redhat.com/support", "stderr_lines": ["Error from server (Prevented from accessing Red Hat managed resources. This is in an effort to prevent harmful actions that may cause unintended consequences or affect the stability of the cluster. If you have any questions about this, please reach out to Red Hat support at https://access.redhat.com/support): admission webhook "regular-user-validation.managed.openshift.io" denied the request: Prevented from accessing Red Hat managed resources. This is in an effort to prevent harmful actions that may cause unintended consequences or affect the stability of the cluster. If you have any questions about this, please reach out to Red Hat support at https://access.redhat.com/support

Add cluster name to log forwarding entries

Description
When forwarding entries for multiple clusters, it may be difficult to identify the originating cluster, especially for namespaces that exist in every Cloud Pak cluster, such as ibm-common-services or the openshift-* ones.

Change
Add a label to the ClusterLogForwarder that indicates the cluster name.

   - inputRefs: 
      - application
     outputRefs:
      - loki-application
     labels:
       cluster-name: "fke37a"

We could allow more than 1 label in the openshift_logging object:

openshift_logging:
- openshift_cluster_name: sample
  cluster_wide_logging:
  - input: application
    logging_name: loki-application
    labels:
      cluster-name: "{{ env_id }}"
      region: "eu-de"
  - input: infrastructure
    logging_name: loki-application
  - input: audit
    logging_name: loki-audit
  logging_output:
  - name: loki-application
    type: loki
    url: https://loki-application.sample.com
    certificates:
      cert: {{ env_id }}-loki-cert
      key: {{ env_id }}-loki-key
      ca: {{ env_id }}-loki-ca
  - name: loki-audit
    type: loki
    url: https://loki-audit.sample.com
    certificates:
      cert: {{ env_id }}-loki-cert
      key: {{ env_id }}-loki-key
      ca: {{ env_id }}-loki-ca

Deployer shows SUCCESSFUL completion, even if failed

Describe the bug
If the Ansible playbook fails, the deployer displays a "successful" message. The exit code of the container is correct.

To Reproduce

  1. Run the deployer, forcing a failure, for example by not accepting the license
PLAY RECAP *********************************************************************
localhost                  : ok=115  changed=4    unreachable=0    failed=1    skipped=27   rescued=0    ignored=0

Thursday 02 June 2022  17:22:57 +0000 (0:00:00.038)       0:00:12.417 *********
===============================================================================
generators : run the preprocess script (openshift) ---------------------- 0.85s
generators : Generate config through template --------------------------- 0.76s
generators : run the preprocess script (cp4d) --------------------------- 0.74s
generators : run the preprocess script (openshift) ---------------------- 0.70s
generators : Generate config through template --------------------------- 0.66s
generators : Generate defaults through template ------------------------- 0.48s
generators : Create generated_config work folder if it does not exist --- 0.39s
generators : Get stats of /Data/fk-cpd-config/cpd-demo/config ----------- 0.38s
generators : Lookup *.yaml files in /Data/fk-cpd-config/cpd-demo/config --- 0.37s
generators : Create generated_defaults work folder if it does not exist --- 0.30s

===========================================================================
Deployer completed SUCCESSFULLY. If command line is not returned, press ^C.

Expected behavior

PLAY RECAP *********************************************************************
localhost                  : ok=115  changed=3    unreachable=0    failed=1    skipped=27   rescued=0    ignored=0

Thursday 02 June 2022  17:27:18 +0000 (0:00:00.030)       0:00:12.478 *********
===============================================================================
generators : run the preprocess script (openshift) ---------------------- 0.89s
generators : run the preprocess script (cp4d) --------------------------- 0.75s
generators : Generate config through template --------------------------- 0.73s
generators : run the preprocess script (openshift) ---------------------- 0.71s
generators : Generate defaults through template ------------------------- 0.50s
generators : Generate config through template --------------------------- 0.49s
generators : Create generated_config work folder if it does not exist --- 0.39s
generators : Get stats of /Data/fk-cpd-config/cpd-demo/config ----------- 0.37s
generators : Lookup *.yaml files in /Data/fk-cpd-config/cpd-demo/config --- 0.36s
generators : Create generated_defaults work folder if it does not exist --- 0.28s

====================================================================================
Deployer FAILED. Check previous messages. If command line is not returned, press ^C.

Remove dependent cartridges from CP4D 4.5.0 sample configs

Describe the bug
When running the deployer for CP4D 4.5.0 with the sample configs, the following error is issued.

fatal: [localhost]: FAILED! => {"changed": false, "error": 422, "msg": "Failed to create object: b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"NotebookRuntime.ws.cpd.ibm.com \\\\\"ccs-cr\\\\\" is invalid: spec.kind: Invalid value: \\\\\"ccs-cr\\\\\": spec.kind in body should match \\'^ibm-cpd-ws-runtime-.*$\\'\",\"reason\":\"Invalid\",\"details\":{\"name\":\"ccs-cr\",\"group\":\"ws.cpd.ibm.com\",\"kind\":\"NotebookRuntime\",\"causes\":[{\"reason\":\"FieldValueInvalid\",\"message\":\"Invalid value: \\\\\"ccs-cr\\\\\": spec.kind in body should match \\'^ibm-cpd-ws-runtime-.*$\\'\",\"field\":\"spec.kind\"}]},\"code\":422}\\n'", "reason": "Unprocessable Entity", "status": 422}

Remedy
Remove following section from the sample configs for CP4D 4.5.0:

#
# Cartridge case dependencies
#
  - name: ccs

  - name: datarefinery
    
  - name: wsl-runtimes

Simplification - move global and vault config to yaml

The deployer's global and Vault config is now kept in the inventory directory. Items such as cloud_platform and Vault config could be moved to the config directory and integrated into the existing yaml files.

Proposed yaml constructs

Global configuration

global_config:
  environment_name=sample
  cloud_platform=ibm-cloud
  ibm_cloud_region=eu-de

or

global_config:
  environment_name=sample
  cloud_platform=azure
  azure_location=westeurope

etc

Vault configurations:

Take the file-vault as the default if no vault element exists in the configuration.

vault:
  vault_type=file-vault
  vault_authentication_type=none
vault:
  vault_type=ibmcloud-vault
  vault_authentication_type=api-key
  vault_url=https://466a074c-09e6-4fc4-adf9-7891eb18965f.eu-de.secrets-manager.appdomain.cloud
vault:
  vault_type=hashicorp-vault
  vault_authentication_type=certificate
  vault_url=https://hcvault:8200 
  vault_secret_path=secret/ibm
  vault_secret_path_append_group=True
  vault_secret_field=value
  vault_secret_base64=True
vault:
  vault_type=hashicorp-vault
  vault_authentication_type=api-key
  vault_url=http://10.215.26.181:8200
  vault_api_key=vaulttoken
  vault_secret_path=secret/ibm
  vault_secret_path_append_group=True
  vault_secret_field=value
  vault_secret_base64=True

Keep the inventory directory still around for existing configurations but deprecate it over time.

Container error in AnalyticEngine service setup

We are trying to install CP4D 4.0.9 with the Deployer tool on the existing Openshift environment. After a few hours during the installation, we see that the operators are formed properly on the OC side, but while creating the pods required for the AnalyticEngine, it falls into an error state when it takes the images related to the spark and the installation fails on bastion node's terminal. After deleting the tmp files, we tried again several times, but the problem persists.

I am sharing a picture of the error with the pods below. As far as I guess, there is a problem during the image extraction process via ansible scripts.

image

**Cloud Environment

  • OS: OpenShift
  • Version 4.8
  • CP4D Version: 4.0.9

Warning message when setting Vault secret

Describe the bug
When setting a vault secret the following message is shown:

PLAY [Secrets] *****************************************************************
[WARNING]: log file at /tmp/work/olm-utils-ansible.log is not writeable and we cannot create it, aborting

To Reproduce

./cp-deploy.sh vault set -vs vf-oc-login -vsv "oc login https://redacted -u cluster-admin -p redactedR"

Expected behavior
Message should not appear

update documentation and usage of expandWith(matchPattern, remoteIdentifier='name')

expandWith is always used with wildcard as shown in the documentation. In nested structured this will result in multiple matches

cos:
- name: {{ env_id }}-cos
  plan: standard
  location: global
  buckets:
  - name: bucket123
    region_location: eu-de

If you want the cos instance, expandWith(cos[*].name) won't work. You have to use expandWith(cos[0].name)

I had to study the docs of benedict to find out proper usage of match.

Use internal registry provided by olm-utils for air-gapped

When using an intermediate registry, the Cloud Pak Deployer currently uses cloudctl case to start a registry next to the deployer container. Then use container-internal networking to connect to the registry and mirror images.

Since the adoption of olm-utils, the Cloud Pak Deployer now has a registry v2 inside the image and this can be used to mirror images to and from, which makes the process a lot simpler and more reliable.

Simplify Cloud Pak for Integration config file

Reduce the configuration settings for Cloud Pak for Integration by setting version, case_version and channel to the CP4DINT version specified in the header.

  • Remove version, case_version, subscription_channel
  • Create additional yaml file holding the values of the above parameters. Key for the yaml object is the CP4INT version
  • Add state to each instance that indicates whether a service will be installed

Cognos monitors fail in CP4D 4.5

In CP4D 4,5, the repository connection is specified differently compared to 4.0. The solution for both 4.5 and 4.0 is to retrieve the DB2 instance used as the meta-store database to fetch the db2inst1 credentials. The Deployer needs to label the correct db2 instance so it can identify which instance to use.

Always base deployer on latest olm-utils image

Describe the bug
If the user already has a olm-utils image on the server that builds the deployer, the image doesn't get refreshed. Make sure that the deployer always pulls the latest olm-utils image before building from the Dockerfile.

Hotfix - Deployment on ROKS failing in NACL processing

Describe the bug
The following error is issued when deploying on IBM Cloud:

TASK [openshift-nacl-security : set_fact] **************************************
Wednesday 10 August 2022  13:19:23 +0000 (0:00:00.062)       1:07:39.913 ******
fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: [?name=='{{ _current_openshift_cluster.name }}']: 'dict object' has no attribute 'name'\n\nThe error appears to be in '/cloud-pak-deployer/automation-roles/40-configure-infra/openshift-nacl-security/tasks/ibmcloud-configure-nacl-security.yml': line 29, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- set_fact:\n  ^ here\n"}

To Reproduce
Happens with all ROKS clusters where the environment_name is not the same as the OpenShift cluster name.

Cognos analytics instance deployment failed

I have tried to deploy cognos analytics instance with ocp_existing profile and CP4D 4.5. While ansible summary status was successful, in CP4D cognos instance status is Failed.

The following excerpt could be found in ibm-cognos-addon-sp-deployment-xx log:

[2022-08-04T09:06:57.799] [INFO] serviceprovidericp4d - Loaded the jwt pubkey:
[2022-08-04T09:06:57.802] [INFO] serviceprovidericp4d - Payload : , {"username":"admin","role":"Admin","permissions":["administrator","can_provision","manage_catalog","create_project","create_space","author_governance_artifacts","manage_governance_workflow","view_governance_artifacts","manage_categories","manage_glossary","manage_quality","manage_discovery","manage_metadata_import","access_catalog","view_quality"],"groups":[10000],"sub":"admin","iss":"KNOXSSO","aud":"DSX","uid":"1000330999","authenticator":"default","display_name":"admin","iat":1659602864,"exp":1659646028,"accessToken":"VALID_TOKEN","csrf_token":"VALID_TOKEN"}
[2022-08-04T09:06:57.803] [INFO] serviceprovidericp4d - API: /v1/provision/
[2022-08-04T09:06:57.803] [INFO] serviceprovidericp4d - provision controller for instance ID: 1659604017629154
[2022-08-04T09:06:58.655] [ERROR] serviceprovidericp4d - TypeError: Cannot read properties of undefined (reading 'entity')
at _updateConnectionInfo (/app/server/controllers/instances.js:114:81)
at Object.module.exports._updateOverrides (/app/server/controllers/instances.js:213:9)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async module.exports.provisionInstance (/app/server/controllers/instances.js:41:17)
at async /app/server/helpers/controller_wrapper.js:41:20

Openshift 4.10
CP4D 4.5

Air-gap: skip 'env download' before 'env save' after a rebuild - Migrated issue

Background
In air-gap, when you have already run the installer and the status dir is populated,
and you get an update of the deployer,
to refresh the air-gap image you need to :

build to generate the deployer image,
env download to run a new container with the new image (which can take time because deployer will check all cpd images)
env save to commit the last run container to the airgap image

Issue
If you skip the "env download", the last run container is on an old "cloud-pak-deployer", and the new commited "cloud-pak-deployer-airgap" is in fact stale (and it's not obvious because it's new).

Suggestion
If all tools are in status/download, it would be nice to be able to only run 1/build to rebuild "cloud-pak-deployer", and 2/env save to tag and export "cloud-pak-deployer-airgap"

Where
cp-deploy.sh

Watson Assistant installation failure

fatal: [localhost]: FAILED! => {"msg": "The conditional check '_current_cp4d_cartridge.version < "4.0.5"' failed. The error was: error while evaluating conditional (_current_cp4d_cartridge.version < "4.0.5"): 'dict object' has no attribute 'version'\n\nThe error appears to be in '/cloud-pak-deployer/automation-roles/50-install-cloud-pak/cp4d/cp4d-cartridge-install/tasks/cp4d-prep-watson-assistant.yml': line 4, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n- block:\n - name: Run preparation for Watson Assistant\n ^ here\n"}

Enable WKC options

Since CP4D 4.5, WKC has several features which can be enabled/disabled:
https://www.ibm.com/docs/en/cloud-paks/cp-data/4.5.x?topic=catalog-installing

Add the following boolean options to the wkc cartridge:

  • install_wkc_core_only (default False)
  • enableKnowledgeGraph (default False)
  • enableDataQuality (default False)
  • enableFactSheet (default False)
  • enableMANTA (default False)

We should assume that CP4D 4.5.1 is installed as 4.5.0 requires a significant number of preparation steps which are no longer needed with 4.5.1.

Solve dependabot security alert

Describe the bug
Dependabot has raised a security alert on a dependency of the web interface.

"dependencies": {
  "nth-check": ">=2.0.1"
}

"devDependencies": {
  "nth-check": ">=2.0.1"
}

Solve by rebuilding package_lock.json

db2u-kubelet' already has a value (sysctl), and --overwrite is false

Describe the bug
When running a Cloud Pak for Data installation on a cluster that already has the labels for KubeletConfig, the deployer fails when trying to set the node labels.

TASK [cp4d-ocp-kubelet-config : Label worker machine config pool to allow KubeletConfig] ***
Friday 15 July 2022  07:17:54 +0000 (0:00:00.714)       0:01:42.094 ***********
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "oc label machineconfigpool worker db2u-kubelet=sysctl", "delta": "0:00:00.122610", "end": "2022-07-15 07:17:54.988097", "msg": "non-zero return code", "rc": 1, "start": "2022-07-15 07:17:54.865487", "stderr": "error: 'db2u-kubelet' already has a value (sysctl), and --overwrite is false", "stderr_lines": ["error: 'db2u-kubelet' already has a value (sysctl), and --overwrite is false"], "stdout": "", "stdout_lines": []}

Solution
Required code fix to add --overwrite flag to command.

Add Cloud Native Toolkit bootstrap to OpenShift configuration

To provide a Quickstart to the Cloud Native Toolkit (GitOps deployment), provide an option to bootstrap the toolkit as part of OpenShift configuration.

openshift:
- name: "{{ env_id }}"
  ocp_version: 4.8
  compute_flavour: bx2.16x64
  compute_nodes: 3
  infrastructure:
    type: vpc
    vpc_name: "{{ env_id }}"
    subnets: 
    - "{{ env_id }}-subnet-zone-1"
    - "{{ env_id }}-subnet-zone-2"
    - "{{ env_id }}-subnet-zone-3"
    cos_name: "{{ env_id }}-cos"
  cloud_native_toolkit: True
  openshift_storage:
  - storage_name: ocs-storage
    storage_type: ocs
    ocs_storage_label: ocs
    ocs_storage_size_gb: 500
    ocs_version: 4.8.0

New property cloud_native_toolkit is False by default. When set to True, the deployer installs the toolkit from GitHub repo:
https://github.com/cloud-native-toolkit/multi-tenancy-gitops

  • Install ArgoCD operator and wait for CSV to be created
  • Create ArgoCD instance in openshift-gitops project

Write prepare-cluster configmaps into status directory

Enhancement
To be able to re-do cluster settings, it would be good to have the applied yaml files and configmaps in the status directory. This makes it easier to troubleshoot and re-do node changes in case of failure.

Add Cloud Pak for Data 4.5 support

Add Cloud Pak for Data 4.5 support:

  • Provide sample configuration using olm-utils
  • Adjust non olm-utils installation steps to handle depenencies diferently
  • Update documentation if necessary

cp-deply env download fails with all_config variable not instantiated

Describe the bug
When running the cp-deploy env download it fails with this message:

TASK [generators : Run generators] *********************************************
task path: /cloud-pak-deployer/automation-roles/20-prepare/generators/tasks/main.yaml:29
Thursday 07 July 2022 08:13:13 +0000 (0:00:00.026) 0:00:03.824 *********
fatal: [localhost]: FAILED! =>
msg: |-
Unable to look up a name or access an attribute in template string ({{ all_config | dict2items }}).
Make sure your variable name does not contain invalid characters like '-': dict2items requires a dictionary, got <class 'ansible.template.AnsibleUndefined'> instead.

Expected behavior
Accept the all_config and continue processing

Implement Temporary workaround for CPD EDB 4.5.0 and 4.5.1

Due to a hardcoded namespace in the cpd edb operator, the namespace "zen" must be available (with permissions) to allow the CPD EDB operator to create a CM. This CM must then be copied to the actual target CP4D namespace.

A bug report will be created with IBM Development. Once this is fixed, this temporary fix can be removed:
Location of temporary fix:
automation-roles/50-install-cloud-pak/cp4d/cp4d-cartridge-install/tasks/cp4d-prep-cpd-edb.yml

Fix dependabot alert

Describe the bug

Terser insecure use of regular expressions before v4.8.1 and v5.14.2 leads to ReDoS Moderate
#2 opened yesterday • Detected in terser (npm) • deployer-web/ui/package-lock.json

Deployer log only has success/fail message

Describe the bug
The deployer log in $STATUS_DIR/log/cloud-pak-deployer.log only has the following message:

cat $STATUS_DIR/log/cloud-pak-deployer.log
Deployer completed SUCCESSFULLY. If command line is not returned, press ^C.

Full deployer log should be kept.

Refactor case commands to match across Cloud Paks

Is your feature request related to a problem? Please describe.
Currently, every Cloud Pak has its own role/task for saving case files and creating catalog sources. Also, mirroring images to a private or portable registry is only implemented for CP4D.

Describe the solution you'd like
Create generic roles for:

  • case save
  • create catalog source
  • mirror images

Use the current roles available in the cp4d directory, moving them up one level and make generic across CP4D, CP4INT and CP4WA.

Air-gap mode, always load cloud-pak-deployer-airgap.tar to avoid stale - Migrated issue

In air-gap mode
When getting an update of the cloud-pak-deployer (new image in status/download) you have to manually remove the existing cloud-pak-deployer-airgap image before running env apply,

Because the cp-deploy.sh loads status/downloads/cloud-pak-deployer-airgap.tar only if the image is not already present
Hence if you don't remove it first, the image is not updated and you still run the old version.

Suggestion
Compare local and tar version to update if needed.
Or always load - it will be quick if there is no change.

Where
cp-deploy.sh

Add olm_utils: False for CP4D 4.5.0+

Initial installation of CP4D 4.5 with a bunch of cartridges takes a long time. olm-utils installs cartridges one by one and does not parallelize. Adapt the playbooks so that users can choose to run the preview scripts generated by olm-utils.

Fatal Error while creating Cognos Instance

While creating the CA instance the playbook stops with the erro below:

fatal: [localhost]: FAILED! => {"msg": "the field 'args' has an invalid value ({'cp4d_datasource_types': '{{ cp4d_datasource_types_result.stdout | from_json }}'}), and could not be converted to an dict.The error was: Extra data: line 1 column 9 (char 8)\n\nThe error appears to be in '/cloud-pak-deployer/automation-roles/60-configure-cloud-pak/cp4d/cp4d-provision-cognos-instance/tasks/provision_cognos_instance.yml': line 82, column 11, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n        - set_fact:\n          ^ here\n"}

PLAY RECAP *********************************************************************
localhost                  : ok=729  changed=146  unreachable=0    failed=1    skipped=424  rescued=0    ignored=0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.