Comments (9)
@larsks I have opened a RH Support Case for ACM 2.5.5 and Observability: https://access.redhat.com/support/cases/#/case/03391625
from operations.
It looks like in the CoreOS Logging Slack Channel, the next release of Red Hat OpenShift Logging with the Retention feature ETA is Jan 10th to release right after OpenShift 4.12.
from operations.
@computate that section of the documentation seems to be about installing the observability stack (the example they give is for installing it on "infrastructure nodes"), and doesn't seem to have anything to do with the nodes from which it will collect data.
from operations.
Thanks for clarifying @larsks . I am reading further into the docs which state:
It looks like we are manually generating the cluster-monitoring-config
for the prod cluster, which may be why I can't see observability data for the prod cluster.
configMapGenerator:
- name: cluster-monitoring-config
namespace: openshift-monitoring
files:
- config.yaml=configmaps/cluster-monitoring-config.yaml
@larsks Do you know why we are manually creating the cluster-monitoring-config
ConfigMap?
from operations.
@computate looks like I did that for persistent storage for prometheus and alertmanager.
https://github.com/OCP-on-NERC/nerc-ocp-config/pull/158/files
If there's some other way of configuring that storage, then we could remove that configmap.
from operations.
I have been working on removing Observability by disabling autosync on prod and infra clusters, deleting the MultiClusterObservability
component and then recreating it. The multicluster-observability-operator is not recreating the deployments and pods for observability, so something else seems to be going wrong in ACM. I checked all the pod logs and events and see something failing in the multicluster-operators-standalone-subscription pod of :
E1212 15:28:42.757806 1 helmrepoutils.go:534] return code: 404 unable to retrieve chart - url: http://multiclusterhub-repo.open-cluster-management.svc.cluster.local:3000/charts/policyreport-2.5.3.tgz
E1212 15:28:42.757827 1 helmreleasemgr.go:99] failed to download chart from helm repo. - url: http://multiclusterhub-repo.open-cluster-management.svc.cluster.local:3000/charts/policyreport-2.5.3.tgz error: return code: 404 unable to retrieve chart - Failed to download the chart
E1212 15:28:42.757836 1 helmreleasemgr.go:42] failed to download chart from helm repo. - url: http://multiclusterhub-repo.open-cluster-management.svc.cluster.local:3000/charts/policyreport-2.5.3.tgz error: return code: 404 unable to retrieve chart - Failed to download the chart
E1212 15:28:42.757787 1 helmreleasemgr.go:42] failed to download chart from helm repo. - url: http://multiclusterhub-repo.open-cluster-management.svc.cluster.local:3000/charts/management-ingress-2.5.3.tgz error: return code: 404 unable to retrieve chart - Failed to download the chart
from operations.
ACM is not upgrading automatically. @larsks Looking at the ACM subscription it failed with the following error:
error validating existing CRs against new CRD's schema for "multiclusterobservabilities.observability.open-cluster-management.io": error listing resources in GroupVersionResource schema.GroupVersionResource{Group:"observability.open-cluster-management.io", Version:"v1beta1", Resource:"multiclusterobservabilities"}: conversion webhook for observability.open-cluster-management.io/v1beta2, Kind=MultiClusterObservability failed: Post "https://multicluster-observability-webhook-service.open-cluster-management.svc:443/convert?timeout=30s": no endpoints available for service "multicluster-observability-webhook-service"
from operations.
@computate it sounds like this has been resolved. Can we close the issue?
from operations.
@larsks I could just use a review on this PR to upgrade the ACM Operator and also the LokiOperator for the latest support for logging and metrics. OCP-on-NERC/nerc-ocp-config#176
from operations.
Related Issues (20)
- Enable rhods-notebooks namespace in prod cluster
- fix: kruize GPU project: add users to kruize-admin HOT 4
- Handling the End Of Life (EOL) of OS images on NERC's OpenStack HOT 2
- kruize GPU: Release the GPU node from the project in ColdFront HOT 2
- Test adding ESI node to OpenShift cluster outside of ESI
- coldfront - stopped workbenches still allocate/block GPU? HOT 2
- tbd: RHODS workbench overview misses GPU count
- Move one of the unused V100s from OpenStack to ESI and then into the OpenShift Test Cluster HOT 1
- Should we move GPUs from OpenStack to OpenShift? HOT 4
- Missing GPU label from wrk-98
- bug: Connection to the Kruize GPU project on test-2.nerc not possible HOT 7
- Move AI hosts back to ESI from RHELAI HOT 9
- Update Roadmap
- Error related to Persistent Volumes while creating a container (test-2, kruize) HOT 14
- Google Sheet updates for August Invoice Data
- RHELAI is done with 3 of the 4 GPU nodes HOT 5
- Close out Epic once RHELAI requests are done
- Close out InstructLab Epic once done with project
- test-2: Finalizing NNCP to ensure network settings persist after reboot, exploring DHCP options for a scalable solution
- Move 8 V100s & 8 A100SXM4s out of OpenStack
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from operations.