rwynn / monstache Goto Github PK

View Code? Open in Web Editor NEW

1.2K 40.0 176.0 1.34 MB

a go daemon that syncs MongoDB to Elasticsearch in realtime. you know, for search.

Home Page: https://rwynn.github.io/monstache-site/

License: MIT License

Go 92.60% Makefile 0.69% Shell 5.74% Dockerfile 0.97%

elasticsearch go golang mongodb daemon sync oplog realtime river synchronization

monstache's People

Contributors

Stargazers

Watchers

Forkers

wx7614140 crispy1975 robmurtha gophersgang kolesa-team laidingqing cmosgh 0987363 alexsjones moviestoreguy makhdumi kgrvamsi ladislavsopko feats renesugar youthlab sunweiconfidence vanthaiunghoa maiscrm theryaz priestd09 qakart henryykt rohitraj29 jerrielau kapasca hiverepo asamsung getfloret jodevsa lkinley-rythmos dicksontung ke-zhang-rd gmolaire caedmonjiang mrutunjaya mohammedessehemy botgitfool cosiray chandanmishra-03 actopolus leoclc nllahat gitgirish2 wallend justinsteffy geekytex bugfyi shijin-p artstylecode banben zhangduo0729 drusil jlevym systart habush s4mu3lb stanleytakamatsu chandan-kubric superbexperience silame83 zip-kanok yetithefoot linuxnc arkhn mostafahussein zhuthree george-xu-hs jtopel tobefuture827 prizov asif-ir edii incende givetimetolife flarrow7 jeasonchan vivek-byte apindes banna2019 pinguo-lixin phongtnit zhuosichen feishengyin darklost z184924 antonsergeyev isgasho chiragg6 minhuyen erxes sweta271097 rentiansheng robinskumar73 raymarkrx mrhinojosag githubpang realforce1024 yuliang0912 geseq

monstache's Issues

systemd support

Hey,

It might be useful to integrate with systemd to increase reliability of monstache,

Could use github.com/coreos/go-systemd

It is require to send a READY=1 event to systemd upon service instantiation (somewhere in main(), check this link)

It could be nice to integrate with the watchdog and send WATCHDOG=1 if daemon.SdWatchdogEnabled returns interval > 0 such:

go func() {
    interval, err := daemon.SdWatchdogEnabled(false)
    if err != nil || interval == 0 {
        return
    }
    for {
        daemon.SdNotify(false, "WATCHDOG=1")
        time.Sleep(interval / 2)
    }
}()

Thanks.

index ordering issue with elastic max-conns greater than 1

If the bulk indexer is configured with more than 1 connection to Elasticsearch via elasticsearch-max-conns (the default is 10), it is currently possible for an insert followed quickly by an update to the same document to be applied out of order since the operations may end up in different bulk indexing requests (separate connections). This results in a discrepancy between what is in MongoDB and what is in Elasticsearch until the document is updated again or a full sync is performed via direct reads.

To fix this and retain the ability to set max-conns greater than 1 for performance, we can use the MongoDB timestamp as an Elasticsearch document version number such that the old data is rejected at indexing time if newer data already exists in Elasticsearch.

This will also require changes in gtm to the how timestamps are put on document read directly from collections (the direct-read feature). Since we don't have an oplog entry for direct reads we should produce a timestamp with the time that the value was read out of the collection. This is to ensure that these reads are possibly rejected during indexing in the case that a document is updated, pulled from the oplog, and synced between the time a direct read occurs and when it is indexed into Elasticsearch.

Bulk insertion timeout

Hi!

I am running monstache with HA mode on, which goes fine for the first minutes, but then:

ERROR 2017/05/22 13:59:53 elastic: bulk processor "monstache" failed but will retry: Post http://elastichost:9200/_bulk: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Seems like it cannot reconnect to the cluster. The elasticsearch log only shows that it was updateing, but no errors.

Any idea why this could happen? Is there any timeout in monstache that can be set against the elasticsearch cluster? (like the mongodb timeout options)

Thanks for the help in advance!
M

Question on duplication

Just started testing with monstache and so far very impressed. I have a quick question regarding duplicates; specifically how are MongoDB _id properties mapped to ElasticSearch, and how when a document in MongoDB is updated is the correct ElasticSearch document updated?

Otto Export function error always nil

The Otto Export function never actually returns a non-nil error. In addition to checking for error need to also check the return value to determine if an export error occurred.

Multiple Monstache Indexes getting created

@rwynn do the collector code have the functionality to cleanup all these indexes or they have to be there part of the oplog management for the mongodb?

meta collection alternative

Does the meta collection serve any other purpose other than getting the routing info? If not, wouldn't it be better to just query elasticsearch for the routing info?

Question: Can more than one instance run reliably?

We have our MongoDB replica set and our Elasticsearch cluster, initially we were going to run a single dedicate instance (Google Cloud) to run monstache. This got us thinking about failover etc. One instance is a single point of failure, so rather than run an instance we thought of running monstache actually on each ES node... 3 node ES cluster === 3 running monstache processes (one per node).

Based on that scenario would there be an issue with monstache tailing the MongoDB opLog and replicating to the ES cluster with 3 copies running independently?

elastic "No processor type exists with name [attachment]"

My first tentative was unsuccessful probably lack Ingest attachment plugin ?

I tested on 6.1 and as well on 5.6.5

curl localhost:9200/_cat/plugins?v

name component version

I_KmNJ8 ingest-geoip      6.1.0

I_KmNJ8 ingest-user-agent 6.1.0

I_KmNJ8 x-pack            6.1.0

INFO 2017/12/15 05:15:31 Successfully connected to elasticsearch version 6.1.0
TRACE 2017/12/15 05:15:31 PUT /_ingest/pipeline/attachment HTTP/1.1
Host: localhost:9200
User-Agent: elastic/5.0.58 (linux-amd64)
Transfer-Encoding: chunked
Accept: application/json
Content-Encoding: gzip
Content-Type: application/json
Vary: Accept-Encoding
Accept-Encoding: gzip

68
��A
Ы���)J�*LbP�������NM    �e>@x~+X�hv�a�yt��.�鑠�W�|��-4������{���Ά��Y
0


TRACE 2017/12/15 05:15:31 HTTP/1.1 400 Bad Request
Content-Type: application/json; charset=UTF-8
Processor_type: attachment

{"error":{"root_cause":[{"type":"parse_exception","reason":"No processor type exists with name [attachment]","header":{"processor_type":"attachment"}}],"type":"parse_exception","reason":"No processor type exists with name [attachment]","header":{"processor_type":"attachment"}},"status":400}
panic: elastic: Error 400 (Bad Request): No processor type exists with name [attachment] [type=parse_exception]

goroutine 1 [running]:
main.main()
        /exwindoz/home/juno/gowork/src/github.com/rwynn/monstache/monstache.go:2160 +0x270a

gzip = true
stats = true
index-stats = true
mongo-url = "mongodb://localhost:27017"
#mongo-pem-file = "/path/to/mongoCert.pem"
#mongo-validate-pem-file = false
elasticsearch-urls = ["http://localhost:9200"]
#elasticsearch-user = ""
#elasticsearch-password = ""
elasticsearch-max-conns = 10
#elasticsearch-pem-file = "/path/to/elasticCert.pem"
elastic-validate-pem-file = false
dropped-collections = true
dropped-databases = true
replay = false
resume = true
resume-write-unsafe = false
resume-name = "default"
namespace-regex = '^mydb\.(mycollection|\$cmd)$'
namespace-exclude-regex = '^mydb\.(ignorecollection|\$cmd)$'
gtm-channel-size = 200
index-files = true
file-highlighting = true
file-namespaces = ["users.fs.files"]
verbose = true
cluster-name = 'docker-cluster'
direct-read-namespaces = ["db.collection", "test.test"]
exit-after-direct-reads = false

Does it/Will it support parent joins?

Hi there!

Does this project or Will this project support the new parent joins (in replacement of child/parent relationships)?

Thank you!

Can I use dbname as the name of index

now, my index name is
"geek001.db1"

In the past, I was using mongo-connector.
name is "geek001"

So the programs are all querying this

“/geek001/db1/_search.......”

Can I still do this right now?

Filter oplog to only apply selected updates

Not sure if this functionality already exists but my use case is since writing to elasticsearch isn't exactly fast some data that is needed real time is lagged behind. So I was thinking of having the ability to have an instance of monstache that only checks for those fields that are needed real time and ignoring the rest. Is that possible?

Make monstache reuse its stats indexes.

I am having an issue with the way new instances of monstache after spawning will create new stats indexes.

I can already find lots of indexes created by monstache in ES

This is my config.toml file

stats = true
index-stats = true
mongo-url = "mongodb://msdev456789112:27017"
elasticsearch-urls = ["http://elasticsearch:9200"]
elasticsearch-max-conns = 5
dropped-collections = true
dropped-databases = true
replay = false
resume = true
resume-write-unsafe = false
namespace-regex= "^(kf_dev456789112\\.(view\\-)+([A-Za-z\\-])+)"
namespace-exclude-regex = "none"
gtm-channel-size = 10
verbose = false

Is there any way to avoid the creation of multiple stats indexes or maybe clean them up automatically?

yellow open monstache.stats.2017-12-14 A_-kzbsASS-fswKFCRVlkg 5 1  5713    0   1.2mb   1.2mb
yellow open monstache.stats.2017-12-05 GppAM53DTXmuBejq0lvp1Q 5 1  5760    0   1.1mb   1.1mb
yellow open monstache.stats.2017-12-15 0MXNvqCkRkKaRm5xoxIdUQ 5 1  3950    0 870.4kb 870.4kb
yellow open monstache.stats.2017-12-04 dFO_ytsERs2LqBU8AcdF-w 5 1  3144    0 716.6kb 716.6kb
yellow open monstache.stats.2017-12-12 a8L1Rdd0RgOZU8B1ZehWXw 5 1  5760    0   1.1mb   1.1mb
yellow open monstache.stats.2017-12-10 TM4lXM5FTDibssgMjRdgMQ 5 1  5760    0   1.1mb   1.1mb
yellow open monstache.stats.2017-12-16 FfF64nRbSu-rP9PsIidzLQ 5 1  2880    0 641.7kb 641.7kb
yellow open monstache.stats.2017-12-11 onW3JJK9RNu7_J5ifjtlDw 5 1  5634    0   1.3mb   1.3mb
yellow open monstache.stats.2017-12-08 HQ5FUM9RSgaHlBRNh2cHYg 5 1  4294    0 954.7kb 954.7kb
yellow open monstache.stats.2017-12-18 9su5ctnDS_6prtkUWpMr-g 5 1  1546    0 804.8kb 804.8kb
yellow open monstache.stats.2017-12-06 sLwgJSesTKSC1B_JOL-33g 5 1  5413    0   1.2mb   1.2mb
yellow open monstache.stats.2017-12-17 tuqS6yjGRJuEar6D8iBP6g 5 1  2880    0 639.3kb 639.3kb
yellow open monstache.stats.2017-12-07 -9SaEmKNTDW9Lq_JkgWPYg 5 1  5146    0   1.1mb   1.1mb
yellow open monstache.stats.2017-12-13 c-Y0CzBtRK6IuG7nDDiVgQ 5 1  5760    0   1.1mb   1.1mb

Support parent-child relationships

I was playing with this tool, work just fine. Was looking for a way to make parent-child relationships. Do you have any plan to provide support for parent-child relationships?

Improve Failure Handling

Add a fail-fast option which, when true, exits the program immediately after logging the failed request when a _bulk request fails.

Also, add an index-oplog-time option which, when true, includes the date and timestamp from the oplog in the source document sent to elasticsearch. This information is useful in general but specifically on failures to determine the timestamp to replay events from during recovery.

Unchecked errors in AddFileContent

Check error return values for Flush and Close in AddFileContent

elastic.v6?

Is planned make support for elastic.v6? If so, about when?

HA mode - MongoDB permissions

Hi!

Im starting monstache with the following command:
./monstache -cluster-name myHA -worker MyW1 -f my-config.toml

And I receive the following error:
panic: Unable to enable cluster mode: not authorized on monstache to execute command { createIndexes: "cluster", indexes: [ { name: "expireAt_1", ns: "monstache.cluster", key: { expireAt: 1 }, background: true, expireAfterSeconds: 30 } ] }

The MongoDB user were created like this:

db.createUser(
 {
  user: "user",
  pwd: "pass",
  roles: [
    { role: "readWrite", db: "monstache" }
  ]
 }
)

I'm using MongoDB 2.6.3, and if i connect directly to the db with the same user credentials, I am able to create collections as well as indices.

Wouldn't it be beneficial to add to the documentation the proper mongodb permissions for monstache to be used in HA mode?

(offtopic: https://rwynn.github.io/monstache-site/options/#toml-table-default-nil-3 -> this should be changed to 'logs' instead of 'log')

Btw, what you guys are doing is pretty good, keep it up!!!

Thx for the help in advance!
M

Mongo Replicate URL with SSL option Is Not Supported

I used replicate url with ssl=true, then it reports

unsupported connection URL option: ssl=true

goroutine 1 [running]:
log.Panicf(0x96b343, 0x2d, 0xc420093cd0, 0x2, 0x2)
	/usr/local/go/src/log/log.go:329 +0xda
main.main()
	/vagrant/Go/src/github.com/rwynn/monstache/monstache.go:1175 +0x3d2

Failed request line #0 details: {"_index":"ttt.No2","_type":"_doc","_id":"5a9bff2d4e3d6cf0131b07af","status":400,"error":{"type":"invalid_index_name_exception","reason":"Invalid index name [ttt.No2], must be lowercase","index":"ttt.No2"}}

Failed request line #0 details: {"_index":"ttt.No2","_type":"_doc","_id":"5a9bff2d4e3d6cf0131b07af","status":400,"error":{"type":"invalid_index_name_exception","reason":"Invalid index name [ttt.No2], must be lowercase","index":"ttt.No2"}}

Is there any way to do it?

Mongodb allows the capitalization, it already exists

elasticsearch 6.x support

Any plans to support Elasticsearch 6.x?

Update docs on namespace regex

When a database or collection is dropped the namespace for the event is the database_name.$cmd where $cmd represents a virtual collection. If a regex is used to match namespaces, the drop will be ignored by monstache unless the $cmd type namespace is accounted for in the regex.

Allow to select the fields that will be exported

Similar to stripe's mosql.
In my case I don't want to export fields that contain sensitive data. For example PII data, keys, passwords, etc even if this data is hashed or encrypted.

syslog/graylog

it will be nice to have ability to send logs to remote server with different verbose level

PS
while launched with verbose=true, indexing speed is very slow

MongoDB opLog Error?

Hey, we have seen this in our logs a couple of time:

ERROR 2017/05/13 03:09:34 getMore executor error: CappedPositionLost: CollectionScan died due to position in capped collection being deleted. Last seen record id: RecordId(6419450938863389914)

I assume this is due to the opLog moving quick that monstache was able to catch the changes? Unless there is another possible explanation?

ElasticSearch index and type rules

Need to take into account ElasticSearch index and type name rules specified in Issue 6736 and update documentation accordingly.

synchronize data from MongoDb to ElasticSearch

Hi,
I m new in ES and i want synchronize data from MongoDb to ElasticSearch
I have followed the steps in https://rwynn.github.io/monstache-site/start/
this is my config.toml :

``
gzip = true
stats = true
index-stats = true
mongo-url = "mongodb://localhost/testdb11"

elasticsearch-urls = ["http://localhost:9200"]

elasticsearch-max-conns = 10

dropped-collections = true
dropped-databases = true
replay = false
resume = true
resume-write-unsafe = false
resume-name = "default"
namespace-regex = '^testdb11.(commentaires|$cmd)$'
namespace-exclude-regex = '^testdb11.(commentaires|$cmd)$'
gtm-channel-size = 200
index-files = true
file-highlighting = true
file-namespaces = ["commentaires.contenu"]
verbose = true

direct-read-namespaces = ["testdb11.commentaires", "test.test"]
exit-after-direct-reads = false
``

This is the result that i m seeing in elasticsearch.bat cmd :

[2018-01-26T13:38:31,960][INFO ][o.e.n.Node ] [nEk71vY] started
[2018-01-26T13:38:32,676][INFO ][o.e.g.GatewayService ] [nEk71vY] recovered [8] indices into cluster_state
[2018-01-26T13:38:40,556][INFO ][o.e.c.r.a.AllocationService] [nEk71vY] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[books][1]] ...]).
[2018-01-26T13:40:46,897][INFO ][o.e.c.m.MetaDataCreateIndexService] [nEk71vY] [testdb11.commentaires] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2018-01-26T13:40:48,999][INFO ][o.e.c.m.MetaDataMappingService] [nEk71vY] [testdb11.commentaires/-G_AYL89S-SUSQmBKrn0Gw] create_mapping [commentaires]

I m not sure if this result is fine !!? and where can i find the indexed data if this is fine

More Efficient Encoding of Files

The current implementation of AddFileContent copies the GridFS content to a buffer and then base64 encodes the buffer. It is more efficient to encode while copying into the buffer.

Can't build new version

Hi there Ryan,

I want to thank you again for this amazing tool.
I have an issue tho

Can't build the latest source. I need a standalone linux binary to be able to run it in a basic Docker image without anything else.

./monstache.go:575:10: unknown field 'Session' in struct literal of type monstachemap.MapperPluginInput
./monstache.go:598:13: output.Parent undefined (type *monstachemap.MapperPluginOutput has no field or method Parent)
./monstache.go:599:28: output.Parent undefined (type *monstachemap.MapperPluginOutput has no field or method Parent)
./monstache.go:601:13: output.Version undefined (type *monstachemap.MapperPluginOutput has no field or method Version)
./monstache.go:602:29: output.Version undefined (type *monstachemap.MapperPluginOutput has no field or method Version)
./monstache.go:604:13: output.VersionType undefined (type *monstachemap.MapperPluginOutput has no field or method VersionType)
./monstache.go:605:33: output.VersionType undefined (type *monstachemap.MapperPluginOutput has no field or method VersionType)
./monstache.go:607:13: output.TTL undefined (type *monstachemap.MapperPluginOutput has no field or method TTL)
./monstache.go:608:25: output.TTL undefined (type *monstachemap.MapperPluginOutput has no field or method TTL)
./monstache.go:610:13: output.Pipeline undefined (type *monstachemap.MapperPluginOutput has no field or method Pipeline)

Am I doing anything wrong?

Thanks!

Different mappings are wrong

{
no:[1,2,3]
}
or
{
no:1
}
or
{
no:{
no1:1
no2:2
}
}

The above situation is often seen in mongodb ！
And it has already existed

Then some Error in ES 6 , I think it wants to make my data type exactly the same， How do I deal with it?

mongo _id in integer format, inserted as scientific format in elasticsearch

hello @rwynn, thanx for the great work 👍
I have some issue regarding _id in mongodb being sync to elasticsearch using monstache

an integer _id type in mongodb will be submited in a scientific format in elasticsearch
here attached the oplog and monstache verbose log:

Oplog :
{ "ts" : Timestamp(1483929958, 2), "h" : NumberLong("-2994949912686646076"), "v" : 2, "op" : "i", "ns" : "mydb.user", "o" : { "_id" : 28263934, "name" : "Lean publishing", "lname" : "Lean.co.id", "email" : "[email protected]", "password" : "$1/5/VpWVexO3yaxBn/s/", "username" : "Lean" } }
Monstache verbose :
INFO 2017/01/09 09:46:01 request body: {"index":{"_index":"mydb","_type":"user","_id":"2.8263934e+07"}}
{"user_name":"Lean publishing","user_username":"Lean"}

a long _id type in mongodb will be submited just ok in elasticsearch
here attached the oplog and monstache verbose log:

Oplog :
{ "ts" : Timestamp(1483930001, 2), "h" : NumberLong("7988208314667247132"), "v" : 2, "op" : "i", "ns" : "mydb.user", "o" : { "_id" : NumberLong(28263935), "name" : "Lean publishing", "lname" : "Lean.co.id", "email" : "[email protected]", "password" : "$1/5/VpWVexO3yaxBn/s/", "username" : "Lean" } }
Monstache verbose :
INFO 2017/01/09 09:46:46 request body: {"index":{"_index":"mydb","_type":"user","_id":"28263935"}}
{"user_name":"Lean publishing","user_username":"Lean"}

panic: close of closed channel

Getting this since patching to 3.6.2

panic: close of closed channel

goroutine 144 [running]:
main.shutdown.func2(0xc420c94600, 0xc420c94660)
        /home/ec2-user/go/src/github.com/rwynn/monstache/monstache.go:2133 +0x131
created by main.shutdown
        /home/ec2-user/go/src/github.com/rwynn/monstache/monstache.go:2128 +0x13f

Upgrading to 3.x results in broken ES client

Hi @rwynn

We attempted to upgrade to 3.0.6 and ended up with problems connecting to our three node ES cluster (5.2.2 with TLS using Search Guard SSL). The error output is:

root@tf-elasticsearch-20170619-0:~# monstache -f /usr/local/etc/monstache_config_preview.toml 
ERROR 2017/06/21 14:12:40 Unable to create elasticsearch client: no Elasticsearch node available
panic: Unable to create elasticsearch client: no Elasticsearch node available

goroutine 1 [running]:
log.(*Logger).Panicf(0xc42004eaf0, 0xd14431, 0x29, 0xc420529d40, 0x1, 0x1)
        /usr/local/go/src/log/log.go:215 +0xdb
main.main()
        /home/vagrant/go/src/github.com/rwynn/monstache/monstache.go:1384 +0x47c

I then tried the last 2.x release which uses the old Go client, this worked perfectly, then stepped forward to the 3.0.0 release and was met with the same error output as above. So there is something relating to the new Elasticsearch client that doesn't like our setup, or config file perhaps.

Config:

mongo-url = "mongodb://XXXXXX:[email protected]:27017,10.128.0.8:27017,10.128.0.9:27017/?authSource=admin&replicaSet=RS-Test-0"
mongo-pem-file = "/mongodb-test-rs.pem"
mongo-validate-pem-file = false
elasticsearch-url = "https://es0:9200"
elasticsearch-pem-file = "/etc/elasticsearch/root-ca/root-ca.pem"
cluster-name = "test-20170619"
replay = false
resume = true
resume-name = "test-20170619"
namespace-regex = "^testing.testdata$"
gtm-channel-size = 512
gzip = true
stats = true
elasticsearch-retry = true
dropped-databases = false
dropped-collections = false
verbose = true

The above config works fine with 2.14.0, I assume it ignores any new options. Tried the ES URI as a string like above and also an array of strings with each node listed. Turned on/off gzip, stats etc. etc.

Ran out of ideas at the moment, I did hunt down this which seems similar to the error received: olivere/elastic#312

Is this library has high consistency?

Hey,

Is this library assure that no docs are going to be missed ? does it retry / resolve failures such as when ES returns errors?

Thanks.

Everything is fine :)

Hello @rwynn !

Monstache won't sync my MongoDB anymore! :(

As we discussed last time, here is my config.toml

gzip = true
stats = true
index-stats = false
fail-fast = true
elasticsearch-retry = true
namespace-regex = '^profiles.(Profile|Blogs|$cmd)$'
# dropped collections/databases won't work because of dynamic indexes
dropped-databases = false
dropped-collections = false
[[script]]
namespace = "profiles.Profile"
routing = true
path = '../scriptFor_profiles_Profile.js'
[[script]]
namespace = "profiles.Blogs"
routing = true
path = '../scriptFor_profiles_Blogs.js'

The only way I can get my MongoDB to sync is by adding this line in the config.toml

direct-read-namespaces = ["profiles.Profile", "profiles.Blogs"]

However, as soon as the backup is done, it doesn't continue syncing.

If I start it without this line, nothing happens, the "commited" and "index" numbers stay at 0, and the "flushed" number keeps growing and growing....

Allow scripts to drop documents

We are loving the library so far and it's making life very easy replicating from MongoDB. Would be great to be able to use the scripts to conditionally replicate also. Let me explain, we have a pseudo delete option which is represented by a date field in MongoDB, if this field is present then the document is "deleted". The scripts in monstache must return an object according to the docs, but it would be nice to perhaps return false or null to let monstache know not to bother sending the document to Elasticsearch for indexing... so in a script could do something like this...

[[script]]
namespace = "db.collection"
script = """
module.exports = function(doc) {
    if (!!doc && !!doc.deletedAt) {
        return false;
    }
    return doc;
}
"""

This would result in no document being sent to Elasticsearch and keeping the index smaller as we would never want to search documents that are deleted. We could of course filter these out in our Elasticsearch queries, but the searches would be faster not having to account for these documents in the first place. :)

Quick question: Can we import from another .js file in a Javacript script?

Can we import from another .js file in a Javacript script?

If yes, how to? And where does a relative path should start from (the .exe file OR the TOML file)?

TLS Self-Signed Certs

We are using Search Guard SSL to secure the transport and REST endpoints for our ES install. When starting monstache up what I guess is the Go ES client is expecting a validated cert, so it throws an error:

INFO 2016/12/20 17:07:23 GET request sent to https://storage.example.com:9200/
ERROR 2016/12/20 17:07:23 Unable to validate connection to elasticsearch using https://storage.example.com:9200: Get https://storage.example.com:9200/: x509: certificate signed by unknown authority
panic: Unable to validate connection to elasticsearch using https://storage.example.com:9200: Get https://storage.example.com:9200/: x509: certificate signed by unknown authority

goroutine 1 [running]:
panic(0x869240, 0xc4204a4500)
	/usr/local/go/src/runtime/panic.go:500 +0x1a1
log.Panicf(0x91e6a0, 0x43, 0xc42014bd68, 0x4, 0x4)
	/usr/local/go/src/log/log.go:327 +0xe3
main.main()
	/vagrant/Go/src/github.com/rwynn/monstache/monstache.go:752 +0x1884

With the JavaScript client you can specify to accept non-validated certs, I guess this is possible with the Go client also? If not any idea where we would need to import the root CA cert so that the Go client does not complain?

Check length of ElasticHosts array

Check length of array before accessing [0] when connection fails.

add support for indexing GridFS content

Version 1.0 only supports indexing file metadata from GridFS. It is possible to support indexing the actual file content if elasticsearch has an appropriate plugin installed such that the attachment type is made available.

Question: Document Patches

Wanted to get a better understanding of how the various PATCH options work with monstache. We have turned on the enable-patches option to see if we can speed up operations some more (they are fast now, but more speed is always good :)

Does this option stop monstache from removing the ES document and then inserting a fresh copy by updating/patching the existing ES document?

How to check for liveness on a k8s pod

I find monstache to work amazingly fast and scaling it in a kubernetes cluster seems easy but what I can't find documented is how to check if monstache is still running to kill/respawn a pod for it.

Did I miss that in the documentation at https://rwynn.github.io/monstache-site ?

reconnect to mongos and elasticsearch

looks like if service lose connect to mongos (sharding cluster) it's wan't restore it
ERROR 2018/01/29 20:39:04 Unable to save timestamp: Closed explicitly
ERROR 2018/01/29 20:39:04 Closed explicitly
but I can connect to mongos

Expose DirectReadNs through a cli option

Hey,

@rwynn could please add the DirectReadNs to ParseCommandLineFlags as it's currently not exposed,

Thanks :-)

handle missing document data on replay

When a replay is performed it is possible that a previous insert or update was followed by a delete. In that case the document is no longer available in mongo. This scenario should be handled by checking for nil Data on the gtm.Op.

ResumeWork should always call Resume

gtm.Resume should always be called in ResumeWork even if there is no timestamp in mongo.

https://github.com/rwynn/monstache/blob/master/monstache.go#L451

Misleading error message

Error message text in TraceRequest should reference request not response.

collection drop is filtered with namespace-regex

Hey,

I can't make Monstache drop an index that corresponds to a dropped collection when namespace-regex is enabled,

I have in Mongo a db named test with collection name foo using this config:

namespace-regex = "^test.(foo|$cmd)$"

(btw I came across this issue but the docs are gone)

Any clue what's wrong?

Thanks.

Not seeing any activity on Elasticsearch when running Monstache

Hello,

I am fairly new to Mongodb and ES and I appreciate any help with this. The problem I am having is that I don't see any activity in Elasticsearch when launching Monstache. It is connecting to Mongodb successfully and creating a database named "monstache". I have followed the setup in the README file of the github page.

My first question is: I have quite a big amount of files in Mongodb that were all imported using Gridfs. Should the data be imported after launching Monstache?

The fact that it is not connecting to ES might have to do with my TOML file. Here are the contents:

mongo-url = "mongodb://localhost:27017"
elasticsearch-url = "http://localhost:9201"
elasticsearch-max-conns = 10
replay = true
resume = false
resume-name = "default"
namespace-regex = "test2.fs.files"
gtm-channel-size = 300
index-files = true
file-namespaces = ["test2.fs.files"]

This is what I run to launch Monstache:
./bin/monstache -f /root/mongostache/config/config.toml

Also, I have setup a replica set on a single instance for development purposes. I run the mongod instance as such: mongod -replSet rs0 using this method, I am not able to run mongod with --master option. So I am wondering, how I can run the mongod instance as a replica set and also use the --master option when launching mongod?

Thank you for the help,
AJ

Mongo Shards Support

Hello. We are looking for a tool to synchronize our mongoDB to Elastic. But we have sharded collections and we are using MongoS.
We have a workaround that we use in branch(mongos-support) at our fork(kolesa-team) now (https://github.com/kolesa-team/monstache/tree/mongos-support), but it's quite dirty.

Do you plan adding MongoS support to Monstache?

rwynn / monstache Goto Github PK

monstache's People

Contributors

Stargazers

Watchers

Forkers

monstache's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs