Comments (13)
@JohnStrunk: Here are the additional stats that we are requesting for:
- readyToUse boolean flag based on
- SnapshotSchedule Name
- Match Lables
- Namespaces
- VolumeSnapshotClass - current count of snapshot per PVC, so that team can get alerted if it reaches the maxCount number present in SnapshotSchedule.yaml (Kind: SnapshotSchedule), based on
- SnapshotSchedule Name
- Match Lables
- Namespaces
- VolumeSnapshotClass - Current count of count/volumesnapshots.snapshot.storage.k8s.io based on namespace.
our HELM CHART BASED YAML FILES ARE
snapschedule.yaml
apiVersion: snapscheduler.backube/v1
kind: SnapshotSchedule
metadata:
name: consul-snapshot
namespace: {{ .Values.namespace }}
spec:
disabled: {{ .Values.snapshotDisabledFlag }}
claimSelector:
matchLabels:
{{- range $key, $value := .Values.selector }}
{{ $key }}: {{ $value | quote }}
{{- end }}
retention:
expires: {{ .Values.snapshotExpiry }}
maxCount: {{ .Values.maxCount }}
schedule: {{ .Values.schedule }}
snapshotTemplate:
lables:
{{- range $key, $value := .Values.selector }}
{{ $key }}: {{ $value | quote }}
{{- end }}
snapshotClassName: {{ .Values.snapshotClassName }}
snapshotquota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: volumesnapshotsquota
namespace: {{ .Values.namespace }}
spec:
hard:
count/volumesnapshots.snapshot.storage.k8s.io: {{ .Values.snapshotQuota | quote }}
from snapscheduler.
@JohnStrunk : Let me know if this looks okay to you
from snapscheduler.
I think I'd like to limit the metrics to objects that SnapScheduler actually manages (i.e., not report on all snapshots, just those created from a schedule).
Perhaps:
- current_snapshots_total - gauge - {labels: schedule_name, schedule_namespace, pvc_name}
- The number of VolumeSnapshot currently associated w/ the schedule, namespace, pvc
- current_snapshots_ready_total - gauge - {labels: schedule_name, schedule_namespace, pvc_name}
- The number of snapshots that are currently readyToUse
- snapshots_total - counter - {labels: schedule_name, schedule_namespace, pvc_name}
- The total number of snapshots that have been created
The trick is to get metrics that are useful, not too difficult to implement, and don't have terribly high cardinality for Prometheus.
from snapscheduler.
"(i.e., not report on all snapshots, just those created from a schedule)." -- agreed
can we report if the snapshot is successful? i think point 1 is really important to us.
readyToUse boolean flag based on
- SnapshotSchedule Name
- Match Lables
- Namespaces
- VolumeSnapshotClass
from snapscheduler.
I was hoping the ready_total vs total would be sufficient for that use case.
Could you explain a bit more about the need for match labels and VSC in the metrics? I'm particularly concerned about encoding the labels. If the labels and the VSC are determined by the SnapshotSchedule object, wouldn't it's name/namespace be sufficient?
from snapscheduler.
@JohnStrunk: Here is our use-case, we are backing up few StatefulSet services under a specific namespace and they are identified by the "app" label currently. The ask is to notify if there is a backup failure so that the Ops team can take a look and fix the issue. we are using prometheus to scrape the "metrics" endpoint ---> alertmanager ---> Pagerduty and slack notify.
Currently, there is one single VSC that is tied to "ebs.csi.aws.com" but later we want to connect to different drivers such as EFS and create a separate VSC, so 1-1 mapping.
app-snapshot 0 6 * * * 168h 15 false 2021-04-13T06:00:00Z app.kubernetes.io/managed-by=spinnaker,app.kubernetes.io/name=ABC
$ kubectl get SnapshotSchedule -n NAMESPACE -l'app.kubernetes.io/name=ABC'
NAME SCHEDULE MAX AGE MAX NUM DISABLED NEXT SNAPSHOT
app-snapshot 0 6 * * * 168h 15 false 2021-04-13T06:00:00Z
$
Now, this snapshot schedule taps to 3 different EBS volumes for the "app" cluster.
we want to get notified if :
- one out of these 3 EBS volumes failed to get backed up?
- All EBS volumes failed to get backup.
- Backup didn't ran for some reason.
from snapscheduler.
My thought here is that you'd monitor the "app-snapshot" schedule (by filtering on schedule_name) and expect 3 new ready snaps every day.
So, it would probably be good to add a corresponding snapshots_ready_total counter as well.
The failure of the snapshotting flow itself would have to be detected by it never becoming ready, but there's also a case to be made for adding an error counter, too. That could be incremented if the operator is unable to create the VolumeSnapshot object itself (e.g., quota or rbac problems).
from snapscheduler.
@JohnStrunk: agreed. So far the plan looks good. Let me know once the implementation is done. I can test and let u know how it goes.
from snapscheduler.
@JohnStrunk: Just a gentle reminder, Are there any updates? to us, having observability is backup is on high priority.
At least alerting if there is a failure based on some filters should be a good enough starting point.
from snapscheduler.
While it's on my list of items I'd like to add, I don't have a timeline for you.
I'd be happy to provide guidance if you or one of your colleagues would like to work on a PR for it.
from snapscheduler.
Any updates yet @JohnStrunk
from snapscheduler.
seems like is abandoned
from snapscheduler.
seems like is abandoned
As I said before... I'd be happy to provide guidance if someone wants to contribute a PR. However, there doesn't seem to be sufficient interest in this feature for anyone to make it happen.
from snapscheduler.
Related Issues (20)
- Support Pod Security Admission
- Add topology spread constraints to control HOT 3
- Regex evaluating cron is too restrictive HOT 2
- Allow setting annotations on VolumeSnapshots
- Allow specifying SHA hash for container images
- Helm chart should manage CRDs
- name of the snapshot not showing in aws. HOT 1
- Failed to create snapshot content with error cannot find CSI PersistentVolumeSource for volume HOT 10
- Update kube-rbac-proxy to latest
- Create immediate snapshot ability and boolean for schedules with longer intervals.
- The continuous rotation of the snapshot will never complete the shooting. HOT 2
- Helm chart: A Kubernetes PriorityClassName cannot be set HOT 1
- Add metadata.ownerReferences to all created VolumeSnapshots HOT 2
- multiarch support (arm64 specifically) HOT 5
- Unable to schedule every X hours following the cron expression
- Dependency Dashboard
- (question) How many concurrent snapshots?
- Latest helm chart is required HOT 1
- Image with architecture ARM
- Application-aware backups: pre / post hooks
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from snapscheduler.