Comments (3)
Hi, pushed a new release to back off when indexing errors happen to mitigate the log flooding.
from monstache.
Hi, colleague of Manuel here. The specific error message we got was
ERROR 2023/11/24 15:43:43 Bulk response item: {"_index":"main.<col>","_id":"<id>","status":429,"error":{"type":"cluster_block_exception","reason":"index [main.<col>] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];"}}
It was repeated 24 500 000 times in a duration of 10 minutes, totaling roughly 4 GiB of logs.
The steps to reproduce are (though we did not investigate yet whether these can be minimized):
- Deny access to the monstache user, so that some data is queued up
- Let Elasticsearch run almost full
- Stop monstache
- Restore access for monstache
- Restart monstache
- Let Elasticsearch run completely full (up to the flood-stage watermark)
- Observe that monstache begins to rapidly generate log events (2+ million log entries per minute)
from monstache.
Additionally here is a redacted copy of the config file with which we observed the issue:
mongo-url = "mongodb://monstache:<snip:url>"
elasticsearch-urls = ["http://<snip>:9200"]
direct-read-namespaces = ["main.<snip:col>"]
change-stream-namespaces = ["main.<snip:col>"]
workers = ["worker-0", "worker-1"]
gzip = false
stats = true
index-stats = true
elasticsearch-user = "monstache"
elasticsearch-password = "<snip>"
elasticsearch-max-conns = 4
elasticsearch-validate-pem-file = false
elasticsearch-healthcheck-timeout-startup = 200
elasticsearch-healthcheck-timeout = 200
dropped-collections = true
dropped-databases = true
replay = true
resume = true
resume-write-unsafe = false
resume-name = "default"
resume-strategy = 1
index-files = true
file-highlighting = true
file-namespaces = ["users.fs.files"]
verbose = false
cluster-name = 'elasticsearch'
exit-after-direct-reads = false
I'm curious and investigating possible causes in the source code right now. A brief look tells me that the ElasticSearch library just indiscriminately calls the error handler for everything thrown at it via Add(), so if the ingress side works / provides data we'll end up with one error per ingested item. It's unclear to me however at which point throttling should best take place.
from monstache.
Related Issues (20)
- Map index on DELETE action - Not accessing plugin HOT 2
- CVE-2022-37434 HOT 2
- Records are missing in sync HOT 5
- linux/arm64 docker images HOT 4
- Creating multiple indices for one collection on resume HOT 1
- Configure monstache to sync all collections in database
- How save in index data stream
- Can't connect Monstache(local machine) with my MongoDB containers(3 replicas) and elasticsearch containers.
- Version conflict on collection relation
- Monstache starts backoff when getting 404 (deleted object is already deleted in ES) HOT 2
- Bug: Setting mongodb field value to null does not index it in Elasticsearch HOT 5
- Add an option to include mongo change stream in health check
- Obsessive-compulsive reading disorder
- Order of monstache golang functions execution
- Is there a way to know lag / total pending sync
- decending sorting HOT 1
- Monstache monitoring HOT 2
- Migrating mongodb to ES, but lost _id ?
- Does each worker open a new change steam in monstache when it is running with multiple workers?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from monstache.