ibm-mas / ansible-devops Goto Github PK

View Code? Open in Web Editor NEW

49.0 49.0 82.0 25.01 MB

Ansible collection supporting devops for IBM Maximo Application Suite

Home Page: https://ibm-mas.github.io/ansible-devops/

License: Eclipse Public License 2.0

Shell 5.09% Python 5.46% Jinja 89.41% Makefile 0.03%

ansible devops maximo

ansible-devops's People

Contributors

Stargazers

Watchers

Forkers

mdickens belindaclissold racree cuddlyporcupine kiwicloudmaster bwbudak lee-cotton psyntium liuyuheng123 santoshjpawar amitmangalvedkar shajeena prasanthgelli1 vaibhavkulkarniibm fyeh shubhammalvankar padmankosalaram omqarrr isha-sangrolkar whitfiea2 yolandabjyf dwakeman edsonmontanhini leo-miran rawa-resul joaopauloksn barrys-sms bzzwh mbenav3 vaibhav-nazare rdasgupt thibaudbesson hari291 nasirchouhan08 dclain matheus-santos anujkabraibm2022 vincent-pli aroute terenceq alejandro-medici chrscchrn akhilrayala haripalleti vkjgiddi maulik-modi22 krishnaprasadkeerthi swallacertp puchakayalals jcarrillo-powersolution ianboden mcltn pvkvicky2000 mudspringhiker mikehollinger sanku hunterr007 eduardompc whoiscnu shiguemitsu whlying mctmanish chennu prajithram sv665 yuvraj-vansure rene-oromtz kvijai82 joshipavankumar abhishekkushawaha liyongc delgon mange2k amoghban faangbait thivya2710 witekwww zxue

ansible-devops's Issues

Deploying Application Suite using automation playbooks leads to a Truststore Certificate error

Problem Summary

It's not possible to install the Application Suite using the automation playbooks. A Truststore Certificate error, related to the certificateIssuer, happens during the install.

Steps:

1 ./ibm/mas_devops/playbooks/ocp/provision-quickburn.yml
2 ./ibm/mas_devops/playbooks/ocp/configure-ocp.yml
3 ./ibm/mas_devops/playbooks/mas/install-suite.yml

Application Suite installation fails. Going on OpenShift console it's possible to see a "CreateContainerConfigError" on coreidp-login pod.

 TASK [truststore : Wait for public certificate to be ready] ******************************** 
[0;31mfatal: [localhost]: FAILED! => {"changed": false, "msg": "Failed to gather information about Certificate(s) even after waiting for 301 seconds"}[0m

It was observed that, this error is happening because the certificateIssuer that remained as "None" in the Suite CR, as shown below:

A workaround for this is to remove this section from the CR, then deployment finishes successfully.

Mongo CA and server certificate contain identical distinguished name

This is in respect to the self-signed CA and server certificate for use with the Mongo CE operator.

The problem really seems to be with the Common Name (CN) being the same/identical. This causes a problem with Pymongo and hostname validation. The CN for the CA should probably just be something like Mongo CA.

A separate conf file should be created for use when creating the CA. One that contains the different CN and the extensions used when creating it.

Support Health and Predict Utilities deployment

Configure AppConnect (see #83)
Configure WatsonStudio (using external URL, internal one was not tested)
Deploy H&P Utilities
Activate H&P Utilities

Improve loop exit behavior for deprovision roks

Sometimes Even after a cluster is deprovisioned, the loop here is not exiting. Will add an additional conditional check to try to get the loop to exit faster.

    "rc": 0,

    "retries": 31,

    "start": "2021-09-09 20:10:38.036981",

    "stderr": "",

    "stderr_lines": [],

    "stdout": "0\nRetrieving cluster fvtrelease...",

    "stdout_lines": [

        "0",

        "Retrieving cluster fvtrelease..."

    ]

Insecure & invalid defaults provided for BAS configuration in playbooks

See TODOs in lite-core-roks.yml ... please deliver fixes to this branch: https://github.com/ibm-mas/ansible-devops/tree/bugfixes2411

# 4. Install BAS
# -----------------------------------------------------------------------------
- name: Install BAS
  import_playbook: bas/install-bas.yml
  vars:
    # BAS Configuration
    bas_namespace: "{{ lookup('env', 'BAS_NAMESPACE') | default('ibm-bas', true) }}"
    bas_persistent_storage: "{{ lookup('env', 'BAS_PERSISTENT_STORAGE') }}"
    bas_meta_storage_class: "{{ lookup('env', 'BAS_META_STORAGE') }}"

    bas_username: "{{ lookup('env', 'BAS_USERNAME') | default('basuser', true) }}"
    # TODO: Providing a default password of "password" is unacceptable, this needs to be randomly generated if not provided (and provide details in the documentation about how to look up the generated password)
    #       When fixing this, ensure it is fixed in any other playbooks with the same problem too
    bas_password: "{{ lookup('env', 'BAS_PASSWORD') | default('password', true) }}"

    # TODO: if this is related to BAS, the env vars should be prefixed BAS_ ... otherwise, why is the grafana config required under the BAS section here/why is the grafana username set to "basuser" as default?
    #       When fixing this, ensure it is fixed in any other playbooks with the same problem too
    grafana_username: "{{ lookup('env', 'GRAFANA_USERNAME') | default('basuser', true) }}"
    # TODO: Providing a default password of "password" is unacceptable, this needs to be randomly generated if not provided (and provide details in the documentation about how to look up the generated password)
    #       When fixing this, ensure it is fixed in any other playbooks with the same problem too
    grafana_password: "{{ lookup('env', 'GRAFANA_PASSWORD') | default('password', true) }}"

    # TODO: These all need to be made required env vars, these are not useable defaults
    #       When fixing this, ensure it is fixed in any other playbooks with the same problem too
    contact:
      email: "{{ lookup('env', 'BAS_CONTACT_MAIL') | default('[email protected]', true) }}"
      firstName: "{{ lookup('env', 'BAS_CONTACT_FIRSTNAME') | default('John', true) }}"
      lastName: "{{ lookup('env', 'BAS_CONTACT_LASTNAME') | default('Barnes', true) }}"

    # MAS Configuration
    mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}"
    mas_config_dir: "{{ lookup('env', 'MAS_CONFIG_DIR') }}"

Include CP4D dashboard url in CIS and Cert-Manager process

Predict and H&P Utilities customers must access Cloud Pak for Data dashbord to look at their Analytics projects and design notebooks, but cp4d url is not secure. This is important either for MAS MS or MAS Demo environments and whatever one used by customers.

Have a new cname in CIS for cp4d
Have an automation that creates a Certificate resource (cert-manager) using the same cluster issuer mas uses
Have an automation which get the cert and key created by the cert-manager and apply in cp4d dashboard route

By the end of this process cp4d would be accessible also by using a suite.maximo.com domain

Support auto-install of the oc & ibmcloud clients

To simplify usage, we should auto-download the oc and ibmcloud tools (in any roles that require them) to simplity the user experience. The download/extract commands for these can be found in the docker image, however we must make sure that we handle different platforms gracefully .. either limit the auto-download support to specific platforms (and document as such) or make it work universally ... I'm mostly thinking about the usual "Mac problem" here :)

Alternatively can we actually remove the use of ibmcloud and oc entirely and just call their APIs/use their ansible modules, that would simplify things even more.

Support Manual Upgrade Strategy for Suite and Application

The collection should provide support to install MAS and any application using Manual upgrade strategy. So anyone that uses this collection to setup their environment won't get any upgrade without their consent. I would suggest make it the default behavior and only install with Automatic approval in case its specified by the executor.

When working on this issue its important to ensure that all the operators in the ibm-common-services is installed properly, if deployed my MAS csv, it will inherit the Manual upgrade strategy from MAS and in this case we shuld be also handling the approval for each of operator there, otherwise OLDM and Licensing operators wont get installed and MAS installation will fail.

Support separate CIS_APIKEY variable

See https://ibm-mas.github.io/ansible-devops/roles/suite_dns/

We need to support a seperate cis_apikey property, because the API key provided will be stored in a secret in the cluster used by the webhook to create challenge request files in your DNS. We should support the ability to set the API key used here seperate from the main IBMCloud API key used elsewhere so that it can be restricted to only the permissions required by CIS.

Update (@alequint )

Detailed Overview

suite_dns role, today, uses IBM Cloud API Key as we can see here, which is the same API Key used for different purposes (the same one set in the correspondent playbook that uses the role previously referred, here)
This apikey, as stated above, is stored in a secret and will be used by the webhook (external code adapted by us), to interact with CIS (Cloud Internet Services, a service that lives in our IBM Cloud responsible by our Domain and DNS).
Having said that, for security reasons, we do not want to use a powerful/admin Api Key, we need to use another one wit much less permissions (i.e. capable to interact with CIS and only that). Therefore, instead of suite_dns role uses the ibmcloud_apikey, it should use another property, which will contain a more restricted api key.

How to test

Create a new api key with restricted permissions and use it in the new playbook attribute cis_apikey
API Keys are created in IBM Cloud account in Manage > Access (IAM)
Restriction to work only with CIS service can be made through Service Ids (we need to have one for the CIS, associated with api key previously created)

Create OCP verify role

In IBMCloud ROKS we have seen delays of over an hour before the Red Hat Operator catalog is ready to use. This will cause attempts to install anything from that CatalogSource to fail as the timeouts built into those roles are designed to catch problems with an install, rather than a half-provisioned cluster that is not properly ready to use.

The role will poll for 1 1/2 hours waiting for the redhat-operators CatalogSource to be ready. If it is not ready after that time the role will fail the run.

Add must-gather role to collection

We should be able to execute must-gather in a cluster by running a simple role: ibm.mas_devops.must_gather

It's really simple to run must-gather for MAS:

oc adm must-gather --image=quay.io/aiasupport/must-gather -- gather -cgl --noclusterinfo --mas-instance-id {{mas_instance_id}}

The role should obviously support passing in the mas instance id
The role should support configuring where must gather output is saved locally
The role should execute must gather, and once complete report where the output from the gather can be located on the users local system.
Obviously, we need new documentation added
Update all the (internal) MAS FVT scenarios to execute must-gather after testing is complete, and validate that it contains the expected resources, so that our FVT is also validating must-gather processes still work

Reference: https://docs.openshift.com/container-platform/4.6/support/gathering-cluster-data.html#about-must-gather_gathering-cluster-data

SafetyWorkspace does not complete succeesfully

After several attempts, I have noticed SafetyWorkspace is not able to complete successfully.

First error indicates IOT was not properly provisioned:

However, IoT is up and running and no pods indicating error.
Then I tried the following workaround:

And a different error message came up in safety-api pod after the operator reconcile:

context cleanup 7bb3854f-fedb-4cf0-a653-c79014aa595b did not finished
[2021-11-25T18:42:28.071Z] [[32mINFO[39m] [TenantController.provisionTenant] [6356f81e-95e5-4639-921a-c5792bdb521a] [New] Provision request
[2021-11-25T18:42:28.072Z] [[32mINFO[39m] [TenantController.provisionTenant] [6356f81e-95e5-4639-921a-c5792bdb521a] [New] {
  "name": "masdev",
  "guid": "masdev",
  "plan": "None",
  "planGuid": "None",
  "type": "internal",
  "tier": "None",
  "tenantId": "masdev",
  "tenantName": "masdev",
  "bluemixSpaceId": "None",
  "bluemixOrgId": "None",
  "contactName": "None",
  "contactEmail": "None"
}
[2021-11-25T18:42:28.074Z] [[32mINFO[39m] [-1] [6356f81e-95e5-4639-921a-c5792bdb521a] [InternalTenantService.provisionTenant] Provisioning tenant masdev
[2021-11-25T18:42:28.102Z] [[32mINFO[39m] [-1] [6356f81e-95e5-4639-921a-c5792bdb521a] [TenantService.BaseService.update] Updating a tenant 1
[2021-11-25T18:42:28.103Z] [[32mINFO[39m] [-1] [6356f81e-95e5-4639-921a-c5792bdb521a] [TenantService.BaseService.get] Getting a tenant 1
[2021-11-25T18:42:28.217Z] [[32mINFO[39m] [-1] [6356f81e-95e5-4639-921a-c5792bdb521a] [InternalTenantService.provisionTenant] First part of tenant (1) provisioning is successful.
[2021-11-25T18:42:28.219Z] [[32mINFO[39m] [OpsNotifier.notify] [6356f81e-95e5-4639-921a-c5792bdb521a] [1] Notifying ops
[2021-11-25T18:42:28.239Z] [[32mINFO[39m] [1] [6356f81e-95e5-4639-921a-c5792bdb521a] [InternalTenantService._provisionTenantAsync] Starting second part of tenant provisioning
[2021-11-25T18:42:28.240Z] [[32mINFO[39m] [1] [6356f81e-95e5-4639-921a-c5792bdb521a] [InternalTenantService._createDefaultDocs] Creating default config, jobs, shields
[2021-11-25T18:42:28.240Z] [[32mINFO[39m] [1] [6356f81e-95e5-4639-921a-c5792bdb521a] [InternalTenantService._createDefaultDocs] Creating default roles
[2021-11-25T18:42:36.700Z] [[32mINFO[39m] [TenantController.getProvisioningStatus] [e45d1198-0af6-4049-b15e-040dacc71bc0] [1] Get status of tenant provisioning
[2021-11-25T18:42:36.727Z] [[32mINFO[39m] [1] [e45d1198-0af6-4049-b15e-040dacc71bc0] [InternalTenantService.getProvisioningStatus] Getting status of tenant provisioning
[2021-11-25T18:42:36.729Z] [[32mINFO[39m] [1] [e45d1198-0af6-4049-b15e-040dacc71bc0] [TenantService.BaseService.get] Getting a tenant 1
[2021-11-25T18:42:58.260Z] [[31mERROR[39m] [-1] [6356f81e-95e5-4639-921a-c5792bdb521a] [InternalTenantService._provisionTenantAsync] Second part of tenant (1) provisioning failed. DriverException: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at DB2ExceptionConverter.convertException (/home/node/app/node_modules/@mikro-orm/core/platforms/ExceptionConverter.js:8:16)
    at DB2ExceptionConverter.convertException (/home/node/app/node_modules/@iot4i/core/src/data/db2-driver/DB2ExceptionConverter.ts:36:18)
    at DB2Driver.convertException (/home/node/app/node_modules/@mikro-orm/core/drivers/DatabaseDriver.js:194:54)
    at /home/node/app/node_modules/@mikro-orm/core/drivers/DatabaseDriver.js:198:24
    at runNextTicks (internal/process/task_queues.js:58:5)
    at listOnTimeout (internal/timers.js:523:9)
    at processTimers (internal/timers.js:497:7)
    at DB2Driver.find (/home/node/app/node_modules/@mikro-orm/knex/AbstractSqlDriver.js:46:24)
    at DB2Driver.findOne (/home/node/app/node_modules/@mikro-orm/knex/AbstractSqlDriver.js:60:21)
    at SqlEntityManager.findOne (/home/node/app/node_modules/@mikro-orm/core/EntityManager.js:227:22)
    at async Promise.all (index 1)
    at InternalTenantService._createDefaultDocs (/home/node/app/node_modules/@iot4i/core/src/services/InternalTenantService.ts:347:5)
    at InternalTenantService._provisionTenantAsync (/home/node/app/node_modules/@iot4i/core/src/services/InternalTenantService.ts:188:7)
previous KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at DB2Client.acquireConnection (/home/node/app/node_modules/knex/lib/client.js:348:26)
    at runNextTicks (internal/process/task_queues.js:58:5)
    at listOnTimeout (internal/timers.js:523:9)
    at processTimers (internal/timers.js:497:7)
    at DB2Connection.executeQuery (/home/node/app/node_modules/@mikro-orm/core/connections/Connection.js:56:25)
    at DB2Connection.execute (/home/node/app/node_modules/@mikro-orm/knex/AbstractSqlConnection.js:105:21)
    at QueryBuilder.execute (/home/node/app/node_modules/@mikro-orm/knex/query/QueryBuilder.js:300:21)
    at DB2Driver.find (/home/node/app/node_modules/@mikro-orm/knex/AbstractSqlDriver.js:46:24)
    at DB2Driver.findOne (/home/node/app/node_modules/@mikro-orm/knex/AbstractSqlDriver.js:60:21)
    at SqlEntityManager.findOne (/home/node/app/node_modules/@mikro-orm/core/EntityManager.js:227:22)
    at async Promise.all (index 1)
    at InternalTenantService._createDefaultDocs (/home/node/app/node_modules/@iot4i/core/src/services/InternalTenantService.ts:347:5)
    at InternalTenantService._provisionTenantAsync (/home/node/app/node_modules/@iot4i/core/src/services/InternalTenantService.ts:188:7)
[2021-11-25T18:42:58.261Z] [[32mINFO[39m] [OpsNotifier.notify] [6356f81e-95e5-4639-921a-c5792bdb521a] [1] Notifying ops
[2021-11-25T18:42:58.262Z] [[32mINFO[39m] [-1] [6356f81e-95e5-4639-921a-c5792bdb521a] [TenantService.BaseService.update] Updating a tenant 1
[2021-11-25T18:42:58.262Z] [[32mINFO[39m] [-1] [6356f81e-95e5-4639-921a-c5792bdb521a] [TenantService.BaseService.get] Getting a tenant 1
[2021-11-25T18:42:58.264Z] [[33mWARN[39m] [xxxxxxxxx--No-Tenant-Id-xxxxxxxx] [xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx] [uncaughtException] Possibly Unhandled rejection:  DriverException: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at DB2ExceptionConverter.convertException (/home/node/app/node_modules/@mikro-orm/core/platforms/ExceptionConverter.js:8:16)
    at DB2ExceptionConverter.convertException (/home/node/app/node_modules/@iot4i/core/src/data/db2-driver/DB2ExceptionConverter.ts:36:18)
    at DB2Driver.convertException (/home/node/app/node_modules/@mikro-orm/core/drivers/DatabaseDriver.js:194:54)
    at /home/node/app/node_modules/@mikro-orm/core/drivers/DatabaseDriver.js:198:24
    at runNextTicks (internal/process/task_queues.js:58:5)
    at listOnTimeout (internal/timers.js:523:9)
    at processTimers (internal/timers.js:497:7)
    at DB2Driver.find (/home/node/app/node_modules/@mikro-orm/knex/AbstractSqlDriver.js:46:24)
    at DB2Driver.findOne (/home/node/app/node_modules/@mikro-orm/knex/AbstractSqlDriver.js:60:21)
    at SqlEntityManager.findOne (/home/node/app/node_modules/@mikro-orm/core/EntityManager.js:227:22)
    at async Promise.all (index 1)
    at InternalTenantService._createDefaultDocs (/home/node/app/node_modules/@iot4i/core/src/services/InternalTenantService.ts:347:5)
    at InternalTenantService._provisionTenantAsync (/home/node/app/node_modules/@iot4i/core/src/services/InternalTenantService.ts:188:7)
previous KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at DB2Client.acquireConnection (/home/node/app/node_modules/knex/lib/client.js:348:26)
    at runNextTicks (internal/process/task_queues.js:58:5)
    at listOnTimeout (internal/timers.js:523:9)
    at processTimers (internal/timers.js:497:7)
    at DB2Connection.executeQuery (/home/node/app/node_modules/@mikro-orm/core/connections/Connection.js:56:25)
    at DB2Connection.execute (/home/node/app/node_modules/@mikro-orm/knex/AbstractSqlConnection.js:105:21)
    at QueryBuilder.execute (/home/node/app/node_modules/@mikro-orm/knex/query/QueryBuilder.js:300:21)
    at DB2Driver.find (/home/node/app/node_modules/@mikro-orm/knex/AbstractSqlDriver.js:46:24)
    at DB2Driver.findOne (/home/node/app/node_modules/@mikro-orm/knex/AbstractSqlDriver.js:60:21)
    at SqlEntityManager.findOne (/home/node/app/node_modules/@mikro-orm/core/EntityManager.js:227:22)
    at async Promise.all (index 1)
    at InternalTenantService._createDefaultDocs (/home/node/app/node_modules/@iot4i/core/src/services/InternalTenantService.ts:347:5)
    at InternalTenantService._provisionTenantAsync (/home/node/app/node_modules/@iot4i/core/src/services/InternalTenantService.ts:188:7)
[2021-11-25T18:42:58.283Z] [[32mINFO[39m] [1] [6356f81e-95e5-4639-921a-c5792bdb521a] [InternalTenantService._createDefaultDocs] Role administrator exists, skipping ...

This is how SafetyWorkspace end up:

Support specified user ID for ocp_login rather than using default admin

Currently the ocp_login role requires the user to be with IBM Cloud Kubernetes Service Administrator platform role in the target IBM Cloud account, to run the following command to be specific:

ibmcloud oc cluster config -c {{ cluster_name }} --admin

This may cause failures in case the owner (the user/service ID) of IBM Cloud API Key used doesn't have the required permissions for kubernetes service in that IBM Cloud account although it might still has the cluster_admin role inside that OpenShift cluster (we are in that situation for the clusters in P2PaaS account).

More discussions on Slack: https://ibm-watson-iot.slack.com/archives/C0195MVCEUD/p1640135699356200

Mongo playbook requires explicit storageclass variable

Documentation improvement: https://ibm-mas.github.io/ansible-devops/playbooks/dependencies/#install-mongodb-ce

TASK [mongodb : mongodb/community : Fail if mongodb_storage_class is not provided] ******************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "mongodb_storage_class property is required"}

Support installation of Grafana

We can use this as a starting point, but it may be quite out of date:

https://github.com/ibm-watson-iot/iot-docs/tree/master/monitoring#3-grafana-setup

This should be added to the ocp_setup_mas_deps role

We can't lookup last tag when running initbuild in Actions

https://github.com/ibm-mas/ansible-devops/runs/4418885903?check_suite_focus=true

We should be able to track back to the previous release, maybe due to difference in the way Actions checks out the repository, possibly in a way where we don't get access to tags like we did in Travis?

npm install of git-latest-semver-tag starting

added 75 packages, and audited 76 packages in 4s

3 packages are looking for funding
  run `npm fund` for details

3 high severity vulnerabilities

To address all issues, run:
  npm audit fix

Run `npm audit` for details.
- npm install complete
LAST TAG = 
Creating /home/runner/work/ansible-devops/ansible-devops/.changelog
RELEASE LEVEL = initial
Semantic versioning system initialized: 1.0.0
initial release of 1.0.0
Setting /home/runner/work/ansible-devops/ansible-devops/.version to 1.0.0-pre.master
Setting /home/runner/work/ansible-devops/ansible-devops/.previous_version to 1.0.0

Add performance tuning and monitoring related contributions

The following setup tasks in the area of performance tuning and monitoring will be useful to have in ansible. The MAS performance team has bash scripts, but would be nice to port this functionality to the ibm-mas/ansible devops playbook.

Enabling Prometheus user workload monitoring and configuring Prometheus storage
Right-sizing OpenShift Image Registry storage for dependencies (e.g. CP4D)
Deploying Instana Agents
Creating and tuning DB2WH instances in CP4D
Deploying Network Load Balancer in IBM cloud environments for IoT workloads with 100K+ device connections

Support Health deployment and dependencies

Make sure mas/install-app.yml playbook can install Health stand alone
Make sure collection can install Health within Manage (using install-app or a variation of the playbook if needed)
Make sure all dependencies of Health are covered

Lookup domain from route resource in suite_verify role

In roles/suite_verify/tasks/main.yml:

# 5. Print MAS login information
# -----------------------------------------------------------------------------
# TODO: The admin url should be looked up from the route resource, this should
# not rely on the same logic to construct the default URL we used when we installed
# MAS.

- name: "Lookup cluster subdomain"
  community.kubernetes.k8s_info:
    api_version: config.openshift.io/v1
    kind: Ingress
    name: cluster
  register: _cluster_subdomain

- name: "Configure domain"
  set_fact:
    mas_domain: "{{custom_domain | default(mas_instance_id ~ '.' ~ _cluster_subdomain.resources[0].spec.domain )}}"


- debug:
    msg:
      - "Maximo Application is Ready, use the superuser credentials to authenticate"
      - "Superuser Credentials:"
      - "Username: .... {{ superuser_credentials.resources[0].data.username | b64decode }}"
      - "Password: .... {{ superuser_credentials.resources[0].data.password | b64decode }}"
      - "Admin Url: ... https://admin.{{mas_domain}}"

Support IoT deployment and dependencies

Review and test current playbook
Make changes when required

Support db2u standalone operator

After discussions with the team, it's been advixed that we use the standalone db2u operator instead of CloudPak for Data to meet our Db2 dependency.

The task here is to implement alternative support for the standalone operator in a new role (db2u) as an alternative to the existing cp4d_db2wh role.

In the future we may eliminate the cp4d_db2wh role, but for now we just want to introduce this support as an alternative. Deploying Db2 without CP4D will reduce install time and footprint.

The work has already been started in the db2u branch, however after my accident no-one picked the work up so it's remained in limbo for a number of weeks now.

The role needs to:

install the operator (if it's not already installed)
create a db2 instance
not do anything with CP4D at all

Automate Cloud Object Storage provisioning

Automated Provisioning
Create ObjectStorageCFG in config folder at the end of the process

Consolidate provision/deprovision playbooks

With the clean up of default handling into the roles, there is no longer a need to maintain seperate provision-quickburn / provision-roks & deprovision-quickburn / deprovision-roks playbooks.

These should be consolidated into single provision / provision playbooks, with appropriate updates to the mast playbooks that call these, documentation, and associated tekton pipelines.

Initial public release

Migrate code from GitHub Enterprise, prepare code for public release under EPL license & prep for publication to Ansible Galaxy.

Create Tasks and Pipelines based on the collection

Looking forward to build artifacts that can be reused and facilitate user interaction and pipeline automations with ibm-mas.devops. I'm creating this issue to track the work been done around Openshift Pipeline Operator and this collection.

The goal is to create a 1:1 map between the playbooks available in this collection and Tekton tasks and than pipelines.

PoC will include a set of tasks to deploy Manage against a running MAS. The pipeline will be responsible for deploy and configure db2wh and deploy and activate Manage.

Also as part of this work a new public image should be made available. So we can leverage the collection into containers using Tekton tasks.

A plus for this work would be make our tasks available in Tekton hub.https://hub.tekton.dev/

Support MVI deployment and dependencies

Make sure all dependencies of MVI are covered in this playbook
Make sure mas/install-app.yml playbook can install MVI successfully

Automate Watson Discovery installation

Automated Provisioning
Bind it with Assist application (is there a WDCfg somewhere to do that?)

Fix up the mini build system

We've ported parts of the internal build system to this repository, however judging from the results of this build it's not working 100% yet: https://app.travis-ci.com/github/ibm-mas/ansible-devops/builds/236812829

Duplicate AMQ operator

Is there a way to reduce the footprint of Kafka by consolidating two AMQ operators into one? BAS and IoT require Kafkfa. The AMQ and BAS both deploy AMQ operators separately into multiple projects, which seems redundant.

Support Monitor deployment and dependencies

Make sure mas/install-app.yml playbook can install Monitor successfully
Make sure all dependencies of Monitor are covered in this playbook

ibm.mas_devops.ocp_verify : Check if Red Hat Catalog is ready fails in v4.5.0

There is no check for whether the status object exists ... so there is a timing window on a new deploy where the deploy will blow up, as in the example below:

TASK [ibm.mas_devops.ocp_verify : Check if Red Hat Catalog is ready] ********************************************************************************************************************************************
Wednesday 24 November 2021  11:35:49 +0000 (0:00:02.602)       0:36:24.845 ****
fatal: [localhost]: FAILED! => {"msg": "The conditional check 'redhat_catalog_info.resources[0].status.connectionState.lastObservedState == \"READY\"' failed. The error was: error while evaluating conditional (redhat_catalog_info.resources[0].status.connectionState.lastObservedState == \"READY\"): 'dict object' has no attribute 'status'"}

NO MORE HOSTS LEFT **********************************************************************************************************************************************************************************************

PLAY RECAP ******************************************************************************************************************************************************************************************************
localhost                  : ok=19   changed=9    unreachable=0    failed=1    skipped=9    rescued=0    ignored=0

Validate Tekton pipelines for v5.0 release

We've performed significant refactoring of the roles and playbooks to clean up various problems introduced in v4.4 and v4.5, before we formally release v5.0 we need to test the tekton pipelines, it's likely numerous updates are required.

Consistent behaviour: BAS role should check for mas_config_dir being set

Other roles, have a conditional check for MAS_CONFIG_DIR, they only save the file when it's set, for example in db2 (same logic can be see in kafka and mongo dep roles):

# 3. Generate a JdbcCfg for MAS configuration
# -----------------------------------------------------------------------------
- include_tasks: tasks/suite_jdbccfg.yml
  when:
    - mas_instance_id is defined
    - mas_instance_id != ""
    - mas_config_dir is defined
    - mas_config_dir != ""

However in BAS, we don't seem to have any check for whether config dir has been set, resulting in potentially writing to a dangerous and/or undesirable directory if mas_config_dir == "" (which it will if the env var was not set).

- name: Set facts for BASCfg
  set_fact:
    bas_segment_key: "{{_bas_segmentKey_result.resources[0].data.segmentkey | b64decode}}"
    bas_api_key: "{{_bas_apiKey_result.resources[0].data.apikey | b64decode }}"
    bas_endpoint_url: "https://{{_bas_endpoint.resources[0].spec.host}}"
    bas_tls_crt: "{{_bas_certificates_result.resources[0].data['tls.crt'] | b64decode | regex_findall('(-----BEGIN .+?-----(?s).+?-----END .+?-----)', multiline=True, ignorecase=True)  }}"

# 2. Write out the config to the local filesystem
# -----------------------------------------------------------------------------
- name: Copy BASCfg to filesytem
  ansible.builtin.template:
    src: bascfg.yml.j2
    dest: "{{ mas_config_dir }}/bas-{{ bas_namespace }}.yml"

Complete BAS install support

We have the bas role almost complete, but it's not fully integrated into the sample playbooks and there were a few remaining issues to be ironed out before it's considered ready to use

Support Assist deployment and dependencies

Documentation not updated for BAS automation

please deliver fixes to this branch: https://github.com/ibm-mas/ansible-devops/tree/bugfixes2411

https://ibm-mas.github.io/ansible-devops/playbooks/lite-core-roks/

This page needs various updates to reflect the addition of automated BAS deployment to the playbook.

Address this after the issues raised in #65 have been addressed, as the fixes there will impact the documentation (we only want to fix this once)

The same issues are present on these pages as well:

Support Predict deployment and dependencies

To complete support for Predict we actually need to complete support for Manage components.

Today in the code we have this:

---
# Default application spec for Manage
mas_app_ws_spec:
  bindings:
    jdbc: "{{ mas_appws_jdbc_binding | default( 'system' , true) }}"
  components: "{{ mas_appws_components | default({'base': {'version': 'latest'}}, true) }}"
  settings:
    db:
      dbSchema: "{{ db2_schema }}"
      maxinst:
        demodata: "{{ manage_demo_data | bool }}"
        db2Vargraphic: true
        indexSpace: MAXINDEX
        tableSpace: MAXDATA
        bypassUpgradeVersionCheck: false

However, because our playbook strategy is to drive configuration via environment variables, there is no way to leverage this capability today as it requires settint vars specifically in the playbook; which we don't want to do becuase we want the config to be passed in via env vars.

To solve this, let's support something like this:

export MAS_APPWS_COMPONENTS=base=latest,health=latest
ansible-playbook playbooks/mas/configure-app.yml

Inside the suite_app_configure role we should have a new plugin that can parse name:value pairs in this env var into the object that we need to pass to the workspace cr spec. This would be used in the defaults/main.yml file to set the default value for mas_appws_components, with a default if the env var is not set being omit (ie don't define the variable at all)

Then the pipeline itself will need updated to have a new param:

    - name: mas_appws_components_manage
      type: string
      description: Manage components to configure in the workspace
      default: "base=latest,health=latest"

and also ensure this param is passed to the manage configure workspace step in the pipeline:

    # 16.2 Configure Manage workspace
    - name: cfg-manage
      params:
        - name: junit_suite_name
          value: app-cfg-manage
        - name: mas_app_id
          value: manage
        - name: mas_workspace_id
          value: "$(params.mas_workspace_id)"
        - name: mas_appws_components
          value: "$(params.mas_appws_components_manage)"
      taskRef:
        name: mas-devops-configure-app
        kind: ClusterTask

Hopefully that gives a good idea of what we need here.

LetsEncrypt Staging + Cert-Manager + MAS does not work

None of these applications can be successfully installed or installed + configured using the Ansible collection.

MSO, Safety and Monitor will install but not configure
Predict will not install

In my testing in fvt-dev, only the original application support is functional (IoT & Manage)

BAS deployment breaks lite-core-roks playbook

Please deliver fixes to this branch: https://github.com/ibm-mas/ansible-devops/tree/bugfixes2411

1st run:

TASK [ibm.mas_devops.bas_install : Wait for FullDeployment to be ready (60s delay)] ********************************************************************************************************************************************************
Wednesday 24 November 2021  15:52:40 +0000 (0:00:02.010)       0:03:43.775 ****
FAILED - RETRYING: Wait for FullDeployment to be ready (60s delay) (40 retries left).
FAILED - RETRYING: Wait for FullDeployment to be ready (60s delay) (39 retries left).
... <snip> ...
FAILED - RETRYING: Wait for FullDeployment to be ready (60s delay) (3 retries left).
FAILED - RETRYING: Wait for FullDeployment to be ready (60s delay) (2 retries left).
FAILED - RETRYING: Wait for FullDeployment to be ready (60s delay) (1 retries left).
fatal: [localhost]: FAILED! => {"api_found": true, "attempts": 40, "changed": false, "resources": [{"apiVersion": "bas.ibm.com/v1", "kind": "FullDeployment", "metadata": {"creationTimestamp": "2021-11-24T15:52:39Z", "generation": 1, "managedFields": [{"apiVersion": "bas.ibm.com/v1", "fieldsType": "FieldsV1", "fieldsV1": {"f:spec": {".": {}, "f:airgapped": {".": {}, "f:backup_deletion_frequency": {}, "f:backup_retention_period": {}, "f:enabled": {}}, "f:allowed_domains": {}, "f:db_archive": {".": {}, "f:frequency": {}, "f:persistent_storage": {".": {}, "f:storage_class": {}, "f:storage_size": {}}, "f:retention_age": {}}, "f:env_type": {}, "f:event_scheduler_frequency": {}, "f:ibmproxyurl": {}, "f:image_pull_secret": {}, "f:kafka": {".": {}, "f:storage_class": {}, "f:storage_size": {}, "f:zookeeper_storage_class": {}, "f:zookeeper_storage_size": {}}, "f:postgres": {".": {}, "f:storage_class": {}, "f:storage_size": {}}, "f:prometheus_metrics": {}, "f:prometheus_scheduler_frequency": {}}, "f:status": {"f:phase": {}}}, "manager": "OpenAPI-Generator", "operation": "Update", "time": "2021-11-24T15:53:49Z"}, {"apiVersion": "bas.ibm.com/v1", "fieldsType": "FieldsV1", "fieldsV1": {"f:status": {".": {}, "f:conditions": {}}}, "manager": "ansible-operator", "operation": "Update", "time": "2021-11-24T16:01:43Z"}], "name": "fulldeployment", "namespace": "ibm-bas", "resourceVersion": "133659", "selfLink": "/apis/bas.ibm.com/v1/namespaces/ibm-bas/fulldeployments/fulldeployment", "uid": "299ab75b-61f5-4751-97d7-b50840e5c1a7"}, "spec": {"airgapped": {"backup_deletion_frequency": "@daily", "backup_retention_period": 7, "enabled": "false"}, "allowed_domains": "*", "db_archive": {"frequency": "@monthly", "persistent_storage": {"storage_class": "", "storage_size": "10G"}, "retention_age": 6}, "env_type": "lite", "event_scheduler_frequency": "*/10 * * * *", "ibmproxyurl": "https://iaps.ibm.com", "image_pull_secret": "bas-images-pull-secret", "kafka": {"storage_class": "", "storage_size": "5G", "zookeeper_storage_class": "", "zookeeper_storage_size": "5G"}, "postgres": {"storage_class": "", "storage_size": "10G"}, "prometheus_metrics": [], "prometheus_scheduler_frequency": "@daily"}, "status": {"conditions": [{"ansibleResult": {"changed": 12, "completion": "2021-11-24T16:01:43.152198", "failures": 1, "ok": 17, "skipped": 3}, "lastTransitionTime": "2021-11-24T16:01:43Z", "message": "Failed to find exact match for kafka.strimzi.io/v1beta2.Kafka by [kind, name, singularName, shortNames]", "reason": "Failed", "status": "False", "type": "Failure"}, {"lastTransitionTime": "2021-11-24T16:01:43Z", "message": "Running reconciliation", "reason": "Running", "status": "True", "type": "Running"}], "phase": "Installing"}}]}

NO MORE HOSTS LEFT *************************************************************************************************************************************************************************************************************************

PLAY RECAP *********************************************************************************************************************************************************************************************************************************
localhost                  : ok=98   changed=22   unreachable=0    failed=1    skipped=32   rescued=0    ignored=0

Wednesday 24 November 2021  16:33:58 +0000 (0:41:17.379)       0:45:01.155 ****
===============================================================================

2nd run the next day, different failure at the same place:

FAILED - RETRYING: Wait for FullDeployment to be ready (60s delay) (1 retries left).
fatal: [localhost]: FAILED! => {"api_found": true, "attempts": 40, "changed": false, "resources": [{"apiVersion": "bas.ibm.com/v1", "kind": "FullDeployment", "metadata": {"creationTimestamp": "2021-11-24T15:52:39Z", "generation": 1, "managedFields": [{"apiVersion": "bas.ibm.com/v1", "fieldsType": "FieldsV1", "fieldsV1": {"f:spec": {".": {}, "f:airgapped": {".": {}, "f:backup_deletion_frequency": {}, "f:backup_retention_period": {}, "f:enabled": {}}, "f:allowed_domains": {}, "f:db_archive": {".": {}, "f:frequency": {}, "f:persistent_storage": {".": {}, "f:storage_class": {}, "f:storage_size": {}}, "f:retention_age": {}}, "f:env_type": {}, "f:event_scheduler_frequency": {}, "f:ibmproxyurl": {}, "f:image_pull_secret": {}, "f:kafka": {".": {}, "f:storage_class": {}, "f:storage_size": {}, "f:zookeeper_storage_class": {}, "f:zookeeper_storage_size": {}}, "f:postgres": {".": {}, "f:storage_class": {}, "f:storage_size": {}}, "f:prometheus_metrics": {}, "f:prometheus_scheduler_frequency": {}}, "f:status": {"f:phase": {}}}, "manager": "OpenAPI-Generator", "operation": "Update", "time": "2021-11-24T15:53:49Z"}, {"apiVersion": "bas.ibm.com/v1", "fieldsType": "FieldsV1", "fieldsV1": {"f:status": {".": {}, "f:conditions": {}}}, "manager": "ansible-operator", "operation": "Update", "time": "2021-11-25T12:03:28Z"}], "name": "fulldeployment", "namespace": "ibm-bas", "resourceVersion": "846474", "selfLink": "/apis/bas.ibm.com/v1/namespaces/ibm-bas/fulldeployments/fulldeployment", "uid": "299ab75b-61f5-4751-97d7-b50840e5c1a7"}, "spec": {"airgapped": {"backup_deletion_frequency": "@daily", "backup_retention_period": 7, "enabled": "false"}, "allowed_domains": "*", "db_archive": {"frequency": "@monthly", "persistent_storage": {"storage_class": "", "storage_size": "10G"}, "retention_age": 6}, "env_type": "lite", "event_scheduler_frequency": "*/10 * * * *", "ibmproxyurl": "https://iaps.ibm.com", "image_pull_secret": "bas-images-pull-secret", "kafka": {"storage_class": "", "storage_size": "5G", "zookeeper_storage_class": "", "zookeeper_storage_size": "5G"}, "postgres": {"storage_class": "", "storage_size": "10G"}, "prometheus_metrics": [], "prometheus_scheduler_frequency": "@daily"}, "status": {"conditions": [{"ansibleResult": {"changed": 1, "completion": "2021-11-25T12:03:28.405458", "failures": 1, "ok": 18, "skipped": 4}, "lastTransitionTime": "2021-11-25T12:03:28Z", "message": "unknown playbook failure", "reason": "Failed", "status": "False", "type": "Failure"}, {"lastTransitionTime": "2021-11-25T12:03:28Z", "message": "Running reconciliation", "reason": "Running", "status": "True", "type": "Running"}], "phase": "Installing"}}]}

NO MORE HOSTS LEFT *************************************************************************************************************************************************************************************************************************

PLAY RECAP *********************************************************************************************************************************************************************************************************************************
localhost                  : ok=98   changed=13   unreachable=0    failed=1    skipped=32   rescued=0    ignored=0

ibm-bas namespace is left in this state:

/mnt/c/Users/DaveParker$ oc get pods -n ibm-bas
NAME                                                      READY   STATUS      RESTARTS   AGE
amq-streams-cluster-operator-v1.7.3-6dc4b4d74d-fn62k      1/1     Running     0          20h
backrest-backup-instrumentationdb-r9c8c                   0/1     Completed   0          15h
behavior-analytics-services-operator-d5f6f5899-s8qvd      2/2     Running     0          20h
createcluster-2zg6f                                       0/1     Completed   0          20h
dashboard-deployment-66c874dbbb-wksj4                     2/2     Running     0          20h
instrumentationdb-7b69964d8c-9cjbb                        1/1     Running     0          20h
instrumentationdb-backrest-shared-repo-75566dfd69-wz44f   1/1     Running     0          20h
instrumentationdb-stanza-create-crv5s                     0/1     Completed   0          20h
kafka-zookeeper-0                                         0/1     Pending     0          20h
pgo-deploy-ln6lx                                          0/1     Completed   0          20h
postgres-operator-6c585b8c78-mzz8v                        4/4     Running     0          20h

Support AppConnect deployment

Doc on how to install AppConnect for MAS 8.6: https://www.ibm.com/docs/en/mas86/8.6.0?topic=ons-app-connect

AppConnect is required to deploy H&P Utilities in MAS 8.6, so its provisioning must be automated as part of this collection.

Steps informed by the team:

Enable ibm catalog:
https://www.ibm.com/docs/en/app-connect/containers_cd?topic=access-enabling-operator-catalog
Install operator: suggest to install appconnect to a single namespace,eg appconn
https://www.ibm.com/docs/en/app-connect/containers_cd?topic=openshift-installing-in-cluster-internet-access
Install dashboard:
1. Need to add ibm-entitlement-key for Production version: https://www.ibm.com/support/knowledgecenter/SSTTDS_11.0.0/com.ibm.ace.icp.doc/certc_install_ibmentitledregistryentitlementkeys.html
2. Check the supported dashboard version and license according to the appconnect operator version:
  https://www.ibm.com/support/knowledgecenter/SSTTDS_11.0.0/com.ibm.ace.icp.doc/certc_licensingreference.html
3. Install
  Notes: IBM Cloud Pak for Integration is not required. spec.useCommonServices must set to false
  https://www.ibm.com/docs/en/app-connect/containers_cd?topic=resources-dashboard-reference#install__install_olm

For mas 8.6, we use appconnect operator version 1.5.2, dashboard version 12.0.1.0-r3 and the license for above (II) is AppConnectEnterpriseProduction L-KSBM-C37J2R

Automatic release creation is not working

See: https://app.travis-ci.com/github/ibm-mas/ansible-devops/builds/236820483

$ $HOME/build.common/bin/gitrelease.sh
/home/travis/.travis/functions: line 109: /home/travis/build.common/bin/gitrelease.sh: No such file or directory

Looks like we forgot to port this part of the build system over to public repo (and it's still referencing the path used by build.common when calling it).

Complete role documentation

The quality of the role READMEs - which are pulled into the documentation (https://ibm-mas.github.io/ansible-devops/) is pretty low, a thorough review of each is required to ensure the documentation is accurate and complete.

Verify default router secret is created in ROKS

BAS install fails due to ROKS provisioning errors, we don't know why sometimes ROKS messes up this aspect of the cluster config, but it causes problems for one of our dependencies; BAS.

We can put something in ocp verify that checks for one of the two secrets that we are expecting on ROKS ... if things are going to fail because sometimes this secret is missing we should make them fail as early as possible seeing as we don't know how to resolve the issue.

Reference: https://github.com/ibm-mas/ansible-devops/blob/master/ibm/mas_devops/roles/bas_install/tasks/bascfg.yml#L63-L103 ... We need to have similar logic in ocp verify that checks that either of these secrets exists. If it doesn't, then fail the verify because the cluster is not ready for MAS.

Support to increase image-registry storage for roks and support for prometheus custom storage class

Need a new role to support enhancing roks clusters to configure:

Increase cluster image-registry from 100Gi (default) to 400Gi - that's needed specially when we are planning to install many CP4D services.
Change openshift-monitoring config map to specify custom storage class for Prometheus service, as this monitoring tools can eat up lot of cluster internal storage

Generated passwords for BAS are logged

There may be other places these are logged too, this is just one example. Passwords should not be logged, documentation should direct the user to where they can obtain the passwords (e.g. what secret they are saved to in k8s, or if they are written to a file on disk). The logs should be suitable for a user to copy/paste into a GitHub issue without risking exposing sensitive information.

TASK [ibm.mas_devops.bas_install : Generate bas_password if none has been provided] ***********************************************************************************************************************
Monday 29 November 2021  15:40:52 +0000 (0:00:00.055)       0:58:13.934 *******
ok: [localhost] => {"ansible_facts": {"bas_password": "ovjRTloehosQYLL"}, "changed": false}

TASK [ibm.mas_devops.bas_install : Generate bas_grafana_password if none has been provided] ***************************************************************************************************************
Monday 29 November 2021  15:40:52 +0000 (0:00:00.068)       0:58:14.002 *******
ok: [localhost] => {"ansible_facts": {"bas_grafana_password": "YnsEjgMfqVMWhhn"}, "changed": false}

Migrate from Travis CI to GitHub Actions

... or sort out billing in Travis, either works.

FWIW .. this is the starter I wrote a while back / completely untested but might be helpful as a starting point:

name: Migrated Travis CI Build

# Do a SIMPLE migration of the travis build, worry about taking advantage of other features of Actions after we get it basically working first
# See if we can get it working on one branch first
on:
  push:
    branches: [ actionstest ]

jobs:
  build:
    runs-on: ubuntu-latest

    # Most scipts will probably fail because $GITHUB_WORKSPACE / $TRAVIS_BUILD_DIR envs are different, etc
    steps:
    - uses: actions/checkout@v2
    - name: Make scripts executable
      run: chmod u+x $TRAVIS_BUILD_DIR/build/bin/*.sh
    
    - name: Run initbuild
      run: $GITHUB_WORKSPACE/build/bin/initbuild.sh
    
    - name: Install Python requirements to build the pages documentation website
      run: python -m pip install -q ansible==2.10.3 mkdocs yamllint
      
    - name: Validate that the ansible collection lints successfully
      run: yamllint -c yamllint.yaml ibm/mas_devops
    
    - name: Validate that the mkdocs site builds successfully
      run: mkdocs build --verbose --clean --strict

    - name: Build the Ansible collection
      run: $GITHUB_WORKSPACE/build/bin/build-collection.sh
    
    - name:  Install the Ansible collection in the container image
      run: cp $GITHUB_WORKSPACE/ibm/mas_devops/ibm-mas_devops-$(cat $GITHUB_WORKSPACE/.version).tar.gz $GITHUB_WORKSPACE/image/ansible-devops/ibm-mas_devops.tar.gz

    - name: Build the docker image
      run: $GITHUB_WORKSPACE/build/bin/docker-build.sh -n ibmmas -i ansible-devops
      
    - name: Push the docker image
      run: $GITHUB_WORKSPACE/build/bin/docker-push.sh
      
    - name: Build the tekton clustertasks
      run: $TRAVIS_BUILD_DIR/pipelines/bin/build-pipelines.sh

# TODOs
# Publish a GitHub release with the ansible collection tgz as an assset
# - $TRAVIS_BUILD_DIR/build/bin/git-release.sh $TRAVIS_BUILD_DIR/ibm/mas_devops/ibm-mas_devops-$(cat $TRAVIS_BUILD_DIR/.version).tar.gz $TRAVIS_BUILD_DIR/pipelines/ibm-mas_devops-clustertasks-$(cat $TRAVIS_BUILD_DIR/.version).tar.gz

#deploy:
#  provider: pages
#  skip_cleanup: true
#  github_url: github.com
#  github_token: $GITHUB_TOKEN
#  verbose: true
#  local_dir: site
#  target_branch: gh-pages
#  on:
#    branch: master

BAS issues

BAS playbook requires 2 explicit storageclass variable declaration. Please consider documenting.
BAS playbook deploys FullDeployment. Please consider using AnalyticsProxy instead. There is a lack of clarity on why FullDeployment is needed when Analyticsproxy also works. As is, BAS has large footprint.

Update pipelines operator install script to wait for CRD to be created

bash pipelines/bin/install-pipelines.sh
/home/david/ibm-mas/ansible-devops/pipelines/bin
subscription.operators.coreos.com/openshift-pipelines-operator created
Wait for Pipeline operator to be ready
Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "tasks.tekton.dev" not found

See TODO in the script:

# TODO: do while STATE != ready
# otherwise the CRD lookup will fail, as the timeout only helps AFTER the CRD initially exists
# STATE=$(oc get subscription  openshift-pipelines-operator -n openshift-operators -o=jsonpath="{.status.state}")

Support Safety deployment and dependencies

Make sure mas/install-app.yml playbook can install Safety successfully
Make sure all dependencies of Safety are covered in this playbook (e.g. DB2 Space Extender)

Support automatic publication to Ansible Galaxy

New plan for GitHub actions:

Store the API key to publish to Galaxy in GitHub secure settings
A new workflow that runs on tags only, which will

verify the tag matches the version determined by the automatic versioning logic
build the collection
run ansible-lint (different to yaml-lint)
publish the collection

Also:

Deliver an update to the main workflow to run ansible-lint

We need to run ansible lint to know whether the collection will actually be accepted into Galaxy, there are things that it will catch which a normal yaml lint will not (e.g. missing metadata required by Galaxy)

For now, we will keep the job of creating tags as a manual process, but once all the kinks are ironed out of the system, we should be able to make a commit to master auto-create a tag, which will trigger the release. For now, anyone with the power to create a tag will be able to manually promote from master to release by creating a tag.

The below was written before I migrated to Github Actions:

When master branch builds we should auto-publish to Ansible Galaxy, at present it's left to me to manually release the collection seperate from the build. This should be easy to accomplish, but hold fire until the teething problems with the ported mini-build system are all resolved

We need to find an alternative way to publish GH releases, as Travis-CI doesn't support SSH keys in the same way that our Travis Enterprise instance does, until we can automate GH release, we can't automate Ansible release.

This is why the build currently reports error when trying to create the release:

Publishing new release 4.2.0 to GitHub
remote: Support for password authentication was removed on August 13, 2021. Please use a personal access token instead.
remote: Please see https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/ for more information.
fatal: Authentication failed for 'https://github.com/ibm-mas/ansible-devops.git/'