Comments (5)
Documents are batched for indexing until one of your options fires: elasticsearch-max-docs, elasticsearch-max-bytes, or elasticsearch-max-seconds. These options trigger a flush at which point a _bulk API request is made with those documents. This request could fail (some or all of the documents not indexed). If it fails and you have set elasticsearch-retry-seconds to a value > 0, that failed request will occur again after the given amount of seconds. If that request fails a 2nd time, the request (including the documents sent) are logged as an error.
Currently, monstache will continue processing when the situation above occurs. That is, it continues processing and sending batches of documents coming off the oplog after an error. I am considering adding a fail-fast
option which when set to true would exit the program on the first _batch send error.
Note: monstache includes options replay
and replay-from-timestamp
. When the former is true monstache replays all events from the beginning of the oplog. When the latter is supplied monstache replays events starting after the timestamp.
Does that answer your question? Would you be interested in the fail-fast option?
from monstache.
Ryan,
thanks for the detailed answer,
I think it makes sense to add the fail-fast
option to avoid repetitive errors when there is a constant failure on the ES service,
Regarding the replay
, so in the case of missing docs due to errors, if I run monstache with the replay
option once in a while, will it re-index the missing docs? will it update/skip existing docs or should the indices get wiped manually first?
Thanks
from monstache.
@asaf
when you run with replay
it will re-attempt to index the docs, so docs that failed previously may succeed. You can delete the indices in Elasticsearch but it's not necessary, the docs that were previously successful with be re-indexed with the same data.
A scenario that would not be fixed by replay
would be a situation where you have 2 documents in a collection with a property of different types. e.g. { age: 1} and {age: "2"}. The first doc will establish the type in ES of age as long. Then the 2nd document would always fail because a string is sent. This type of error would not be fixed by replay
. Replay would fix an issue like ES becoming temporarily unavailable.
from monstache.
That's an accepted behavior :-)
from monstache.
@asaf,
In the next release (shortly) there will be an option to specify a list of collections that you would like to copy directly from mongodb to elasticsearch. This is different than the current situation which allows you to replay and then tail the oplog. The direct sync option will read the collections themselves and not go through the oplog. Since the oplog may or may not contain your entire dataset (it's a capped collection after all), this direct copy option can be used instead. The option to exit monstache once the direct copy is complete will be added along with it (so it can be used in cron jobs for example).
from monstache.
Related Issues (20)
- High rated vulnerability in `golang.org/x/text` (CVE-2022-32149)
- Further understanding of the worker mode
- ARM64 Docker Image
- Failed to find unique document using index pattern While document exists
- elasticsearch-max-bytes not effective, still error : Error 413 (Request Entity Too Large)
- Support for Kibana SSL authentication HOT 1
- Map index on DELETE action - Not accessing plugin HOT 2
- CVE-2022-37434 HOT 2
- Records are missing in sync HOT 5
- linux/arm64 docker images HOT 4
- Creating multiple indices for one collection on resume HOT 1
- Configure monstache to sync all collections in database
- How save in index data stream
- Monstache did not back off writing data when ElasticSearch disk was full (http code 429), causing log spam HOT 3
- Can't connect Monstache(local machine) with my MongoDB containers(3 replicas) and elasticsearch containers.
- Version conflict on collection relation
- Monstache starts backoff when getting 404 (deleted object is already deleted in ES) HOT 2
- Bug: Setting mongodb field value to null does not index it in Elasticsearch HOT 5
- Add an option to include mongo change stream in health check
- Obsessive-compulsive reading disorder
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from monstache.