nais / babylon Goto Github PK
View Code? Open in Web Editor NEWSommerstudentprosjekt 2021 - opprydning av kubernetes ressurser
License: MIT License
Sommerstudentprosjekt 2021 - opprydning av kubernetes ressurser
License: MIT License
A very useful feature would be to just notify the teams that have pods with a status that indicate something is wrong with the pod, but we can't assume a rollback is appropriate.
For a feature like this, i think we can look at the problem the other way around,. Instead of defining which states are bad - we could say that any Pod with a status other than Running
or Completed
notifies the team.
CrashLoopBackOff
fail-fast
on deploys1
then we can just scale down the deployment directly.Legge til og begynne å bruke feature-toggles
Skru av/på sende meldinger til Slack
??
babylon
working hoursalertmanager
Currently the application doesn't create metrics for failing deploys that are skipped due to not being on our allowlist, which is not what we want. Configure so that metrics are created regardless of whether the resource is actually ignored.
nais.yml
liberator
more proposals
Use a different name than babylon
in the spec, suggestion for disableCleanUp
Add kubernetes.io/change-cause
annotation to Deployment
saying that the deployment was rolled back by Babylon because it could not start.
This annotation is copied from the Deployment
to the ReplicaSet
automatically.
This can be very helpful when debugging.
This annotation should also be used to prevent rolling back an application "twice", as this would just cause an endless loop of rollbacks. In other words: Babylon should just ignore ReplicaSets that have an annotation that says it was rolled back by Babylon.
ci-gcp
for integration tests instead of a local docker instance of minikube/kuttldev-gcp
, previous projects etc)Snakk med Terje Sannum
Decide on backend
Decide on what data is useful to store
Where and how do we fetch and write the data (Prometheus etc)
Fix access to Influx on OnPrem-Clusters
Fix time for loggin - decouple from tickrate?
Are we logging the right stuff?
The more natural behaviour here would possible be to ignore the weekend-time? So if there has been an weekend in-between detection and current check, add 48h to the delay.
Per 1044 mandag 2. august er det kun i dev-gcp
at vi ikke får loggmeldinger om at den sover.
date
gir samme klokkeslett og tidssone i dev-fss
og dev-gcp
ConfigMap
ble endret i commit ad4bd77It would be very useful for users that have gotten resources downscaled (and potentially notified about it) to easily delete it as well:
No. would-be-pruned resources
No. alerts sent with different time intervals
No. failing resources sorted by reason for failing
No. of teams affected
No. deployments currently in Pending or Failing, and their error messages
Add rule activations to controller metrics
When we switched to using controller-runtime
's client over the kubernetes go client our app switched endpoints for health and readyness checks. As it stands the checks are on a different port than the application itself, and we cannot use the same one to serve both the checks and the metrics. Currently metrics are under :8080/metrics
and health checks are :8081/healthz
.
CI fails to deploy because NAIS deploy cannot check the state of our application.
prod
configurationbabylon/pkg/deployment/deployment.go
Line 252 in 7907d96
Reasoning: We don't want to rollback to images that are not already running, only roll back when there is a replicaset already active (>0 replicas)
alerterator
slack-channel
of each team/teams
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.