canonical / ua-reviewkit Goto Github PK

View Code? Open in Web Editor NEW

3.0 6.0 6.0 56 KB

Python 60.78% Shell 39.22%

ua-reviewkit's Introduction

Overview

This repository contains tools, scripts, docs and anything else that might be helpful to use as part of a UA handover.

See CONTRIBUTING.md for info on how to contribute.

ua-reviewkit's People

Contributors

Stargazers

Watchers

Forkers

dosaboy bilboer hemanthnakkina-zz nicolasbock sombrafam al3jandrosg

ua-reviewkit's Issues

Check fails with channel not found, when channel specified

Vault channel is specified in the bundle, but the check fails
ua-bundle-checks.openstack (1).log

vault:
bindings:
? ''
: oam-space
certificates: internal-space
etcd: internal-space
secrets: internal-space
shared-db: internal-space
channel: 1.8/stable
charm: vault
num_units: 3
series: jammy
to:
- '10'
- '11'
- '9'

Will send the full bundle privately

--bundle requires full path to file

if the path provided to --bundle is not a path and the file is not found all checks are ignored but no error is raised.

add extra checks in "VM test extras" for handover

Currently we have the following "VM test extras" (if applicable)

GPU passthrough flavor
2 x VMs with DPDK flavor and test east-west traffic
2 x VMs with SRIOV flavor and test east-west traffic

I think we should add further network tests

2 x VMs with ovs/ovn vxlan and test east-west traffic
2 x VMs with ovs/ovn flat provider network and test east-west traffic
2 x VMs with ovs/ovn vlan provider network and test east-west traffic

I know this will give the FEs more things to test, but this will ensure we catch any issues early

innodb-buffer-pool-size>=6G is not must-have for non-OpenStack senarios

Especially with Kubernetes, we are using mysql-innodb-cluster only for Vault storage backend. The default value is actually sufficient.

../ua-reviewkit/juju/ua-bundle-check.py --bundle ./generated/kubernetes/bundle.yaml

=> application 'mysql-innodb-cluster'
[PASS] HA (>=3)
[PASS] max-connections (value=2000)
[FAIL] innodb-buffer-pool-size (value=268435456, expected=6442450944)

https://github.com/canonical/ua-reviewkit/blob/main/juju/checks/openstack.yaml#L35

In order to have a different assertion for kubernetes deployments we should add one to checks/kubernetes.yaml

git.launchpad.net should be marked as deprecated

I understand that now the development of ua-reviewkit is done on Github instead of Launchpad. But git.launchpad.net still has content and hard to notice it's deprecated.

Can we add one commit to the git by deleting everything but leaving one README file saying we should refer to https://github.com/canonical/ua-reviewkit from now on? So we can avoid confusions.
https://git.launchpad.net/ua-reviewkit/tree/

P.S. I found it by one handover still using git.launchpad.net as the source of some tests.

check for vault charm channel config set to stable

charmstore versions of the charm had the "channel" config set to "stable" which results in whatever the default snap version has at the time of deployment being installed. Newer versions of the charm pin to 1.7/stable, 1.8/stable etc.

We should throw an error when we see vault charms deployed with channel=stable.

ensure radosgw namespace-tenants is enabled for new deployments

ref: https://bugs.launchpad.net/charm-ceph-radosgw/+bug/2003776

image from projects.registry.vmware.com/sonobuoy/sonobuoy does not work anymore

I tried to run the kubernetes-extra-checks.py script and my script got stuck trying to pull the sonobuoy image from projects.registry.vmware.com/sonobuoy/sonobuoy.

The pod description reported the following errors:

  Normal   Scheduled  44s                default-scheduler  Successfully assigned sonobuoy/sonobuoy to juju-29bbea-kubernetes-10
  Normal   BackOff    14s (x2 over 42s)  kubelet            Back-off pulling image "projects.registry.vmware.com/sonobuoy/sonobuoy:v0.56.3"
  Warning  Failed     14s (x2 over 42s)  kubelet            Error: ImagePullBackOff
  Normal   Pulling    3s (x3 over 43s)   kubelet            Pulling image "projects.registry.vmware.com/sonobuoy/sonobuoy:v0.56.3"
  Warning  Failed     3s (x3 over 43s)   kubelet            Failed to pull image "projects.registry.vmware.com/sonobuoy/sonobuoy:v0.56.3": rpc error: code = NotFound desc = failed to pull and unpack image "projects.registry.vmware.com/sonobuoy/sonobuoy:v0.56.3": failed to resolve reference "projects.registry.vmware.com/sonobuoy/sonobuoy:v0.56.3": projects.registry.vmware.com/sonobuoy/sonobuoy:v0.56.3: not found
  Warning  Failed     3s (x3 over 43s)   kubelet            Error: ErrImagePull

I was able to successfully run it though by not specifying an image for sonobuoy in the line
./sonobuoy run --kube-conformance-image=${SONOBUOY_CONFORMANCE_IMAGE} --mode=${SONOBUOY_MODE} --skip-preflight --plugin e2e --e2e-parallel ${SONOBUOY_PARALLEL} --wait 2>&1 as it pulls from docker sonobuoy/sonobuoy by default.

Add a check for Ceph benchmarks for microcloud/microceph deployments

We currently use fio but this isn't done for smaller deployments and/or ones with no RBD workload.

It'd be useful to still have some baseline data. We can probably do "rados bench" at least.

[bundle-check] add sanity checks for keystone ldap connection timeouts

Enhancement to add checks for ldap connection timeouts.

charm: keystone-ldap
config param: ldap-config-flags

The value of ldap-config-flags is json string.
So regex need to be performed for this check.
(This will require a new assertion schema grep/contains in ua-bundlechecks)

Conditions to check:
If use_pool is true then pool_connection_timeout have to be set
If use_pool is false or not set then connection_timeout have to be set.

Note: the params use_pool, pool_connection_timeout, connection_timeout are part of ldap-config-flags value.

Juju bundles with cross model relations break the extra checks

If a bundle contains cross-model relations then the checker bombs out as below:

================================================================================
UA Juju bundle config verification
 * 2024-06-10 12:57:59.159130
 * type=openstack
 * bundle=/home/alejandro/Downloads/juju_bundle.yaml
 * bundle_sha1=266f88723064211f8bb5e964794a969667f58cf5
 * assertions_sha1=43de655a3beb2f75bade815786cfa88e04a16e19
================================================================================
ERROR: Error parsing the bundle file: expected a single document in the stream
  in "<unicode string>", line 1, column 1:
    series: focal
    ^
but found another document
  in "<unicode string>", line 4533, column 1:
    --- # overlay.yaml
    ^
Please check the above errors and run **again.**

I'd say we should either handle this in the checker or clean up the bundle somehow

ua-bundle-check fails when overlays are present in the exported bundle

When overlays (e.g. LMA offers) are present in the exported bundle, ua-bundle-check fails with the below error:

$ ./ua-bundle-check.py --bundle ../../exported-bundle.yaml
================================================================================
UA Juju bundle config verification
 * 2022-09-02 13:19:46.272974
 * type=openstack
 * bundle=../../exported-bundle.yaml
 * bundle_sha1=<removed>
 * assertions_sha1=<removed>
================================================================================
ERROR: Error parsing the bundle file: expected a single document in the stream
  in "<unicode string>", line 1, column 1:
    series: focal
    ^
but found another document
  in "<unicode string>", line 3758, column 1:
    --- # overlay.yaml
    ^
Please check the above errors and run again.

Overlay in the exported bundle:

--- # overlay.yaml
applications:
  logstash-server:
    offers:
      logstash-beat:
        endpoints:
        - beat
        acl:
          admin: admin
  nagios:
    offers:
      nagios-monitors:
        endpoints:
        - monitors
        acl:
          admin: admin
  prometheus:
    offers:
      prometheus-target:
        endpoints:
        - target
        acl:
          admin: admin

kubernetes/parse_results.py can output the summary of the tests

At this point, the script and kubernetes-extra-checks.sh are focused on listing the failed tests. It would be nice to output the summary of the test such as those lines in the generate tarball.

plugins/e2e/results/global/junit_01.xml:

plugins/e2e/results/global/e2e.log:SUCCESS! -- 337 Passed | 0 Failed | 0 Pending | 5433 Skipped

[current output]

tar xvf sonobuoy_0.53.2_linux_amd64.tar.gz
LICENSE
sonobuoy
./sonobuoy delete --all
INFO[0000] already deleted kind=namespace namespace=sonobuoy
INFO[0000] deleted kind=clusterrolebindings
INFO[0000] deleted kind=clusterroles
./sonobuoy run --skip-preflight --plugin e2e --e2e-parallel 30 --sonobuoy-image projects.registry.vmware.com/sonobuoy/sonobuoy:v0.53.2 --mode=non-disruptive-conformance --wait
INFO[0000] created object name=sonobuoy namespace= resource=namespaces
INFO[0000] created object name=sonobuoy-serviceaccount namespace=sonobuoy resource=serviceaccounts
INFO[0000] created object name=sonobuoy-serviceaccount-sonobuoy namespace= resource=clusterrolebindings
INFO[0000] created object name=sonobuoy-serviceaccount-sonobuoy namespace= resource=clusterroles
INFO[0000] created object name=sonobuoy-config-cm namespace=sonobuoy resource=configmaps
INFO[0000] created object name=sonobuoy-plugins-cm namespace=sonobuoy resource=configmaps
INFO[0000] created object name=sonobuoy namespace=sonobuoy resource=pods
INFO[0000] created object name=sonobuoy-aggregator namespace=sonobuoy resource=services
++ ./sonobuoy retrieve
./parse_results.py failed
List of tests with failed status:
./sonobuoy results 202109300228_sonobuoy_e50a561b-5e87-40a6-8557-1fe506bc3d38.tar.gz -m dump
./sonobuoy delete --all --wait
INFO[0000] deleted kind=namespace namespace=sonobuoy
INFO[0000] deleted kind=clusterrolebindings
INFO[0000] deleted kind=clusterroles
INFO[0000] deleted kind=namespace namespace=hostport-8595

sonobouy test results location is unknown

After running the ua-reviewkit tests for k8s, the tests results are not saved to the same folder. I tried to find the results, but other than what was pasted on the screen, I did not find them. I recall seeing a text saying that the results were savec under /tmp, but I did not find them there neither. Adding functionality to prescribe the location of the test results would be a great addition.

Need to support parsing bundle overlays

Newer deployments are now splitting the applications, flags and placement into multiple bundle files. As a result some of the checks fail because the data is not in the base bundle.yaml.

We need to figure out how to parse the aggregate bundle for the existing checks.

Example failures:
[WARN] global-physnet-mtu (value=1550, expected=9000)
in: overlay-openstack-options.yaml

[FAIL] dns-servers (not found)
also in: overlay-openstack-options.yaml

however its not limited to just this overlay-openstack-options.yaml bundle. There are now multiple different bundle files and I am told it's not standardised for each person doing deployments which bundles are used for what.

Overlays in this deployment:
overlay-additional-applications.yaml
overlay-hostnames.yaml
overlay_lma-offers.yaml
overlay-openstack-options.yaml
overlay_openstack-saas.yaml
overlay-openstack-ssl.yaml
overlay-removed-applications.yaml
overlay-service-placement.yaml
overlay-vips.yaml

Additonally: the LMA applications are moved to the 'lma' model and lma.yaml file.

[bundle-check] ensure bdev-enable-discard not auto if bcaches used for osds

This is because of a known bug in that charm that causes the charm to skip discard config if bcache devices are in use.

charm bug - https://bugs.launchpad.net/charm-ceph-osd/+bug/1872665

ensure charms not deployed with latest/stable channel

When charms are deployed from the charmhub they must not use the latest/stable channel as this is unsupported and can contain unexpected charms - it was originally set to be the last version found in the charmstore (cs:). Instead charms should use a specific track/channel according to https://docs.openstack.org/charm-guide/latest/project/charm-delivery.html

We should therefore warn when we see that charms are using latest/stable.

SONOBUOY_VERSION appears out of date

Dave C reported that 0.18 no longer works.

ubuntu@iadaz01sinf01:~/dave/handover/ua-reviewkit/kubernetes$ diff -u kubernetes-extra-checks.sh.orig kubernetes-extra-checks.sh
--- kubernetes-extra-checks.sh.orig 2021-02-11 19:43:44.405074951 +0000
+++ kubernetes-extra-checks.sh 2021-02-11 19:11:53.370554415 +0000
@@ -1,6 +1,7 @@
#!/bin/bash -ex

-SONOBUOY_VERSION=${SONOBUOY_VERSION:-0.18.0}
+#SONOBUOY_VERSION=${SONOBUOY_VERSION:-0.18.0}
+SONOBUOY_VERSION=${SONOBUOY_VERSION:-0.20.0}
SONOBUOY_PARALLEL=${SONOBUOY_PARALLEL:-30}

function fetch_sonobuoy() {
@@ -15,7 +16,8 @@
fi

./sonobuoy delete --all || true

./sonobuoy run --skip-preflight --plugin e2e --e2e-parallel ${SONOBUOY_PARALLEL} --sonobuoy-image projects.registry.vmware.com/sonobuoy/sonobuoy:v${SONOBUOY_VERSION} --mode=non-disruptive-conformance --wait 2>&1

#./sonobuoy run --skip-preflight --plugin e2e --e2e-parallel ${SONOBUOY_PARALLEL} --sonobuoy-image projects.registry.vmware.com/sonobuoy/sonobuoy:v${SONOBUOY_VERSION} --mode=non-disruptive-conformance --wait 2>&1
./sonobuoy run --skip-preflight --plugin e2e --e2e-parallel ${SONOBUOY_PARALLEL} --sonobuoy-image projects.registry.vmware.com/sonobuoy/sonobuoy:v${SONOBUOY_VERSION} --kube-conformance-image-version v1.18.6 --mode=non-disruptive-conformance --wait 2>&1
./sonobuoy results $(./sonobuoy retrieve) -m dump | ./parse_results.py failed
./sonobuoy delete --all --wait 2>&1
}

kubernetes/README.md needs some updates based on upstream Sonobuoy changes

kubernetes/README.md has a good instruction how to run Sonobuoy. However, there seems some changes in the upstream Sonobuoy release manifest and some of the content may not be applicable any longer.

Would be nice those instructions are updated to follow the changes:
https://sonobuoy.io/decoupling-sonobuoy-and-kubernetes/

[kubernetes/README.md]

Sonobuoy depends on kubernetes version that is being used.
As per documentation, each version of sonobuoy will cover that
same k8s version and two older versions (e.g. v0.14.X covers
k8s 1.14, 1.13 and 1.12).

...

Based on that version, check out which is the corresponding
sonobuoy available on:
https://github.com/vmware-tanzu/sonobuoy/releases/

Once the version was found, run the following command, as
the example below:

$ SONOBUOY_VERSION=0.19.0 ./kubernetes-extra-checks.sh

support parsing local charms as `local:charm-name`

An export-bundle from a recent deployment references all charms as local:-0 e.g. local:ceilometer-agent-0.

update osm bundle checks

The osm charms have evolved since the existing checks were written so need updating to reflect current state.

sonobuoy doesn't check PVC creation

we have a customer K8S which is deployed on top of openstack. Customer compained persistent volume claims stuck in Pending State.
The root cause is that openstack cinder uses one AZ named "nova" while nova-compute uses zone1, zone2 and zone3 causing mismatch.

We ran sonobuoy validation test suite before delivering the K8s cluster and all passed, it seems sonobuoy doesn't pick up the issue.

sonobuoy default run fails with Unschedulable nodes

With Kubernetes 1.24 deployed with Juju/MAAS on baremetal
default way of launching sonobuoy like this:

./sonobuoy run

Fails with the following result:

$ ./sonobuoy results $results
Plugin: e2e
Status: failed
Total: 1
Passed: 0
Failed: 1
Skipped: 0

from the e2e log:
Jan 12 14:50:28.313: INFO: ==== node wait: 3 out of 6 nodes are ready, max notReady allowed 0. Need 3 more before starting.
Jan 12 14:50:58.311: INFO: Unschedulable nodes= 3, maximum value for starting tests= 0
Jan 12 14:50:58.311: INFO: -> Node k8s-control-plane-1 [[[ Ready=true, Network(available)=false, Taints=[{juju.is/kubernetes-control-plane true NoSchedule }], NonblockingTaints=node-role.kubernetes.io/control-plane,node-role.kubernetes.io/master ]]]

The workaround is to include the suggested taint to non-blocking-taints arg like this:

./sonobuoy run --plugin-env=e2e.E2E_EXTRA_ARGS=--non-blocking-taints=juju.is/kubernetes-control-plane,true,NoSchedule

So this might be a good idea to launch it the same way in kubernetes-extra-checks.sh

[bundle check] check ceph space bindings

check that ceph-access space bindings are correct across applications

canonical / ua-reviewkit Goto Github PK

ua-reviewkit's Introduction

Overview

ua-reviewkit's People

Contributors

Stargazers

Watchers

Forkers

ua-reviewkit's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs