Comments (6)
Hi @apetrik-nh
Do you delete documents in the parent collection?
I added this commit which would ignore the delete events on the parent collection since you only use the parent through a relate and have keep-src set to false. In this case the parent document is never indexed and might be the source of the 404 deletes.
from monstache.
@rwynn , thank you for the quick reply.
We never delete collections and in majority of cases delete no parent and child documents but instead marking those with "deleted" flag. I doubt your fix can help as failed transactions are logged against IDs of the child documents. There was an assumption internally that if the child document is marked as deleted, then the parent document is changed, the engine will try to reindex all the children for that parent and will try to delete it again in the Elastic which will fail. But we cannot replicate this use-cases on the local environment (no bulk failed logs are there). But in production we constantly see those errors in the logs.
PS: do you have a donation credentials to support your library and your work? Like a PayPal?
from monstache.
@apetrik-nh assumption sounds like it's along the correct lines. My guess (if I understand your case) would be something like this is happening...
- soft deleted flag set on a parent document
- via relate child documents run through your golang script Map function
- you detect deleted flag and return
Drop=true
to delete each child from the search index. - later, parent updated in some other way
- same thing happens but deletes on all children 404 since they were previously removed
So, in your Map function, if I remember correctly, you get passed a "change doc" which represents what actually changed in MongoDB. I think you could use this to determine in the case above whether a Drop=true
or a Skip=true
is warranted. E.g. if deleted
in updatedFields
then Drop=true else Skip=true.
https://www.mongodb.com/docs/manual/reference/change-events/update/
e.g.
"updateDescription": {
--
"updatedFields": {
"email": "[email protected]"
},
"removedFields": ["phoneNumber"],
"truncatedArrays": [ {
"field" : "vacation_time",
"newSize" : 36
} ]
}
from monstache.
I think returning Drop=true
from a script is the only way you can influence monstache to turn an insert or update into an Elasticsearch delete. So, if you are not actually deleting anything from watched collections, that would be where I would look.
Monstache also has a Process
escape hatch that basically lets you process the events yourself in which case a delete could happen there also. But you would have to code the delete yourself in Process
(add a delete request to the bulk processor).
from monstache.
Thanks, for the offer @apetrik-nh to support this development in some way. Unfortunately, I don't have much time to invest in monstache these days so just try my best to do a little improvement here and there (and accept pull requests).
Would love to hear if you and your team are in a position to take better care of the project and have an interest.
from monstache.
Thank you for all your replies. We will try to reproduce this massive log issue on the controlled environment using your hint that deletes can only come from Map script. And maybe will make a logic smarter to return Skip instead of Drop if deletion is not happening right now.
On the contribution part, our team has zero experience in Golang and even current script is a bit painful to support because of missing expertise.
from monstache.
Related Issues (20)
- Obsessive-compulsive reading disorder
- golang plugin can't be mounted without building plugin from source code
- Is there a way to know lag / total pending sync
- decending sorting HOT 1
- Monstache monitoring HOT 1
- Migrating mongodb to ES, but lost _id ?
- Does each worker open a new change steam in monstache when it is running with multiple workers?
- cant build plugin for ARM64
- How to keep an embedded document in full sync?
- Docker container exits without error after 5 days despite restart=always HOT 1
- Elasticsearch 8 with PKI auth
- How to get reference field data in transform script HOT 1
- Token resume lead to Mongo HIGH CPU usage
- How can i return multiple after comparing data?
- monstache - problem reading from MongoDB secondaries. HOT 1
- Add default expose port of 8080 to Dockerfile
- Add elasticsearch-healthcheck-timeout as an argument HOT 1
- Repositories/ Dockerfiles for monstache base images HOT 2
- Thnvthtt
- Document versioning fails HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from monstache.