Comments (8)
Hey there @andreasgeisslerdt. v0.0.21
will be out this week.
from mariadb-operator.
Hey there @andreasgeisslerdt ! Thanks for reporting this!
Could you provide the status
of your resources to understand in which reconciliation stage the operator is? For example:
❯ kubectl get database data-test -o jsonpath="{.status}"
{"conditions":[{"lastTransitionTime":"2023-10-10T13:46:44Z","message":"Error connecting to MariaDB","reason":"Failed","status":"False","type":"Ready"}]}%
❯ kubectl get database data-test -o jsonpath="{.status}"
{"conditions":[{"lastTransitionTime":"2023-10-10T13:49:16Z","message":"Created","reason":"Created","status":"True","type":"Ready"}]}%
The creation of resources (database, user, grant) should be restarted up to a configurable number of retries.
The operator does perform retries on the SQL resources, utilizing an exponential backoff strategy. This means that the longer your connections encounter errors, the more time the operator will wait before retrying. However, this approach is designed to eventually succeed, saving database connections and avoiding unnecessary overhead.
Thanks!
from mariadb-operator.
Hi Martin,
here the details:
I have 3 DBs:
- mariadb-galera (galera cluster) + user
- cds-db (single pod, no HA) + user + db + grant
- policy-mariadb (single pod, no HA) + user + db + grant
here the status:
ubuntu@controlkubectl -n onap get mariadb
NAME READY STATUS PRIMARY POD AGE
cds-db True Running cds-db-0 8h
mariadb-galera True Running mariadb-galera-0 8h
policy-mariadb True Running policy-mariadb-0 7h50m
ubuntu@control01-daily-master-sm:~$ kubectl -n onap get user
NAME READY STATUS MAXCONNS MARIADB AGE
my-user True Created 100 mariadb-galera 8h
policy-user False Error connecting to MariaDB 100 policy-mariadb 7h53m
sdnctl False Error connecting to MariaDB 100 cds-db 8h
ubuntu@control01-daily-master-sm:~$ kubectl -n onap get database
NAME READY STATUS CHARSET COLLATE MARIADB AGE NAME
policyadmin False Error connecting to MariaDB utf8 utf8_general_ci policy-mariadb 7h53m
sdnctl False Error connecting to MariaDB utf8 utf8_general_ci cds-db 8h
ubuntu@control01-daily-master-sm:~$ kubectl -n onap get grant
NAME READY STATUS DATABASE TABLE USERNAME GRANTOPT MARIADB AGE
policy-user-policyadmin-policy-mariadb False Error connecting to MariaDB policyadmin * policy-user true policy-mariadb 7h54m
sdnctl-sdnctl-cds-db False Error connecting to MariaDB sdnctl * sdnctl true cds-db 8h
here the requested "database" statuses:
ubuntu@control01-daily-master-sm:~$ kubectl -n onap get database sdnctl -o jsonpath="{.status}"
{"conditions":[{"lastTransitionTime":"2023-10-10T22:54:59Z","message":"Error connecting to MariaDB","reason":"Failed","status":"False","type":"Ready"}]}ubuntu@control01-daily-master-sm:~
$ kubectl -n onap get database policyadmin -o jsonpath="{.status}"
{"conditions":[{"lastTransitionTime":"2023-10-11T04:56:20Z","message":"Error connecting to MariaDB","reason":"Failed","status":"False","type":"Ready"}]}
Here are the logs of the mariadb-operator:
[mysql] 2023/10/10 23:56:20 packets.go:37: unexpected EOF
[mysql] 2023/10/10 23:56:20 packets.go:37: unexpected EOF
[mysql] 2023/10/10 23:56:21 packets.go:37: unexpected EOF
[mysql] 2023/10/11 00:56:20 packets.go:37: unexpected EOF
[mysql] 2023/10/11 00:56:20 packets.go:37: unexpected EOF
[mysql] 2023/10/11 00:56:21 packets.go:37: unexpected EOF
[mysql] 2023/10/11 01:56:20 packets.go:37: unexpected EOF
[mysql] 2023/10/11 01:56:20 packets.go:37: unexpected EOF
{"level":"error","ts":1696989380.774429,"msg":"Reconciler error","controller":"grant","controllerGroup":"mariadb.mmontes.io","controllerKind":"Grant","grant":{"name":"sdnctl-sdnctl-cds-db","namespace":"onap"},"namespace":"onap","name":"sdnctl-sdnctl-cds-db","reconcileID":"2f3f9d82-2504-43cd-a755-c0581e55a884","error":"error reconciling in TemplateReconciler: error creating MariaDB client: 1 error occurred:\n\t* driver: bad connection\n\n","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
{"level":"error","ts":1696989380.776826,"msg":"Reconciler error","controller":"database","controllerGroup":"mariadb.mmontes.io","controllerKind":"Database","database":{"name":"sdnctl","namespace":"onap"},"namespace":"onap","name":"sdnctl","reconcileID":"2e8c4bec-546f-435c-91ad-ea38e3600b90","error":"error reconciling in TemplateReconciler: error creating MariaDB client: 1 error occurred:\n\t* driver: bad connection\n\n","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
[mysql] 2023/10/11 01:56:21 packets.go:37: unexpected EOF
{"level":"error","ts":1696989381.1669478,"msg":"Reconciler error","controller":"user","controllerGroup":"mariadb.mmontes.io","controllerKind":"User","user":{"name":"sdnctl","namespace":"onap"},"namespace":"onap","name":"sdnctl","reconcileID":"4a2f3fc1-8228-45b4-883d-0176210a8c63","error":"error reconciling in TemplateReconciler: error creating MariaDB client: 1 error occurred:\n\t* driver: bad connection\n\n","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
The connection problems occur only for the single pod MariaDBs and IMHO they are caused by the fact, that the "PeerAuthentication" is not active at that time, the user/grant/database is created by the mariadb-operator.
from mariadb-operator.
After this situation, I restarted all mariadb-operator pods, deleted the "sdnctl" user and redeployed it.
The result is, that the new user is not handled at all:
ubuntu@control01-daily-master-sm:~$ kubectl -n onap get user
NAME READY STATUS MAXCONNS MARIADB AGE
my-user True Created 100 mariadb-galera 8h
policy-user False Error connecting to MariaDB 100 policy-mariadb 8h
sdnctl 100 cds-db 82s
from mariadb-operator.
Hey @andreasgeisslerdt ! Thanks a lot for your input, very much appreciated.
Judging by your comment, it seems like exponential backoff it's not quite working for your case. We would need to retry more often.
After this situation, I restarted all mariadb-operator pods, deleted the "sdnctl" user and redeployed it.
The result is, that the new user is not handled at all:
Right, you have re-deployed the resource with the same name and not restarted the operator. The thing is that the operator has an internal retry cache indexed by name/namespace, so even if you recreate the resource, it won't retry as it hits the cache.
My suggestion is introducing a spec.retryInterval
in the SQL resources (User
, Grant
, Database
) so you can opt-out from exponential backoff and explicitly define the retry internal:
apiVersion: mariadb.mmontes.io/v1alpha1
kind: User
metadata:
name: user
spec:
mariaDbRef:
name: mariadb
passwordSecretKeyRef:
name: user
key: password
maxUserConnections: 20
host: "%"
retryInterval: 5s
apiVersion: mariadb.mmontes.io/v1alpha1
kind: Grant
metadata:
name: grant
spec:
mariaDbRef:
name: mariadb
privileges:
- "SELECT"
- "INSERT"
- "UPDATE"
database: "*"
table: "*"
username: user
grantOption: true
host: "%"
retryInterval: 5s
apiVersion: mariadb.mmontes.io/v1alpha1
kind: Grant
metadata:
name: grant
spec:
mariaDbRef:
name: mariadb
privileges:
- "SELECT"
- "INSERT"
- "UPDATE"
database: "*"
table: "*"
username: user
grantOption: true
host: "%"
retryInterval: 5s
Bear in mind that this can potentially increase the number of connections in your database.
Thoughts?
from mariadb-operator.
I just tested again the procedure to deploy the MariaDB (cds-db) and user.
- I deleted the old user and DB
- Deleted all mariadb-operator pods
- deployed mariadb and user
- deployment was successful (logs showed retries to create the user until DB was ready)
So the problem might not be the retries, but the caching might be the problem.
Could it be that allthough the connection to the DB was not successful (EOF, see above) and the creation failed, it was added to the cache and never retried.
from mariadb-operator.
Hey there @andreasgeisslerdt ! I was working on a PR for this and fixed a bug in the controller responsible for SQL resources. Basically after a resource has been non ready for a while, the controller does not set it back to ready after it becomes healthy. It will be released in v0.0.21
along with the spec.retryInterval
feature, which I still think it can be useful for many use cases.
from mariadb-operator.
OK, thanks for the info, what is the planned release date for v0.0.21?
In the meantime I will test:
- enable galera for the 2 failing DBs or
- disable istio sidecar injection for all DBs
from mariadb-operator.
Related Issues (20)
- [Bug] Resource Limits not correct set
- [Bug] manifests yaml was deleted after automatic upgrade
- Support for successfuljobshistorylimit and failedjobshistorylimit in backups.k8s.mariadb.com
- [Bug]
- [Feature] Protect ressources to prevent accidential deletion/data loss HOT 1
- [Feature] Backup encryption
- [Feature] UBI9 support for all system Docker images HOT 1
- [Feature] IBM Power Docker images
- [Bug] Galera Init job keeps failing HOT 2
- [Bug] MariaDB-10.4 compatibility due to REPLICATION REPLICA being a 10.5+ grant HOT 1
- [Feature] Manage initial user with `User` CR
- [Feature] Manage initial database and user with `Database` with and `User` CRs
- [Feature] Improve `Connection` reconciliation. Password rotation
- [Feature] Improve `MaxScale` reconciliation. Password rotation
- [Feature] Improve `MariaDB` reconciliation. Password rotation
- [Bug] Galera with maxScale is failing to recover
- [Bug] Galera cluster failed to recovery HOT 1
- Issue: `mariadb-operator-(webhook)` stuck in starting workers, watching certificates
- [Bug] [0.29.0] GRANT - sql error HOT 1
- [Bug] Helm repo change breaks umbrella charts
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mariadb-operator.