GithubHelp home page GithubHelp logo

consol-monitoring / mod-gearman-worker-go Goto Github PK

View Code? Open in Web Editor NEW
6.0 6.0 9.0 591 KB

Mod-Gearman Worker rewrite in Golang

License: GNU General Public License v3.0

Makefile 3.58% Go 87.12% Perl 9.30%

mod-gearman-worker-go's People

Contributors

dependabot[bot] avatar infraweavers avatar sni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mod-gearman-worker-go's Issues

Command execution problem when more then one space between arguments and / or options

Hey, I found that if you have more than one space between the arguments and / or options of a plugin, the extra space will be treated as a parameter by exec.CommandContext, at least for one of my PHP plugins.

I replaced
splitted := strings.Split(received.commandLine, " ")
per
splitted := strings.Fields(received.commandLine)

and it solved my problem, as Fields will split one or more spaces.

Should a pull this request ?

Tks.

Windows Gearman does not work?

Please help me, i need to use modgearman on windows and no work
Does mod_gearman need the file structure of linux even while in linux?
Sorry for my bad english

I attach a part of the log
[2019-09-28 10:05:23.841][Info][mod_gearman_worker.go:130] mod_gearman_worker - version 1.1.1 (Build: e6c4a4f) starting with 5 workers (max 50), pid: 2172
[2019-09-28 10:05:24.246][Warn][readAndExecute.go:274] system error: exec: "/usr/local/nagios/libexec/check_ping": file does not exist
[2019-09-28 10:05:24.336][Warn][readAndExecute.go:274] system error: exec: "/bin/sh": file does not exist
[2019-09-28 10:05:26.437][Warn][readAndExecute.go:274] system error: exec: "/bin/sh": file does not exist
[2019-09-28 10:05:29.485][Info][mod_gearman_worker_windows.go:21] got sigint, quitting
[2019-09-28 10:05:29.486][Info][mod_gearman_worker.go:95] mod-gearman-worker-go shutdown complete
[2019-09-28 10:05:34.981][Info][mod_gearman_worker.go:130] mod_gearman_worker - version 1.1.1 (Build: e6c4a4f) starting with 5 workers (max 50), pid: 8496
[2019-09-28 10:05:36.432][Warn][readAndExecute.go:274] system error: exec: "/bin/sh": file does not exist
[2019-09-28 10:05:46.437][Warn][readAndExecute.go:274] system error: exec: "/bin/sh": file does not exist
[2019-09-28 10:05:56.440][Warn][readAndExecute.go:274] system error: exec: "/bin/sh": file does not exist
[2019-09-28 10:06:06.503][Warn][readAndExecute.go:274] system error: exec: "/bin/sh": file does not exist
[2019-09-28 10:06:16.437][Warn][readAndExecute.go:274] system error: exec: "/bin/sh": file does not exist

job_timeout option is ignored

Hi,
we are using mod-gearman-worker-go with OMD 2.90 (also with 3.00). The job_timeout option from etc/mod-gearman/worker.cfg is ignored.

Option is set to:
job_timeout=120

The Logfile says:
[2019-04-10 17:28:10.850][Trace][worker.go:137]
type: service
result_queue: check_results
host_name: XXXX
service_description: XXXX
start_time: 1554910390.000000
next_check: 1554910390.000000
core_time: 1554910090.847630
timeout: 60
command_line: /omd/sites/stbv/local/lib/nagios/....

best regards,
Jens

Can we build rpm similar to mod-gearman-worker

Hi Sni,

This is a great start, I appreciate your efforts put in to develop. Could you please provide the guide to build rpm for mod-gearman-worker-go. For time being, how to run mod-gearman-worker-go as a service/daemon and put it in the linux startup etc similar to the one which you have build mod-gearman-worker. Will mo-gearman-worker-go fix the issue of orphaned host/service checks?

Thanks,
Ranjith Kumar R

Worker looses connection to gearmand

Hello,

I noticed some odd behaviors with the worker in Go on cluster.

  1. the node connection to gearmand will disappear sometimes, but the workers to the host and service queues will there and sometimes, each node will have two connections showing, although just one daemon running on each node

  2. sometimes the process just dies, no segfaults

  3. I have 4 nodes, 3 of them have 50 workers and one 10, so I should have 160 workers max per queue. The daemon died on 3 nodes and gearmand was showing 261 workers available

I am using naemon-1.0.8 and gearmand 0.33.

Tks.

Host checks get orphaned and "lost" under 1.2.3

Hiya,

We've just upgraded to OMD 5 (which includes 1.2.3) and we are finding that our host checks end up getting "stuck" in the running state according to naemon (i.e. the spinner is there) and then after a period of time, chunks of hosts go "down" because of (host check orphaned, is the mod-gearman work on queue 'host' running?) which it is.

We downgraded to 4.40 (where we came from and everything is good). Then upgraded to 4.60 and it's also behaving like this on there, so presumably the change is somewhere between 1.1.5 and 1.2.1. We have also found that if we copy mod-gear from 4.40 and symlink it in, the behaviour goes away; so we're pretty confident it's the worker:

OMD[default@OMDA02]:~/bin$ ls -la mod*
-rwxr-xr-x 1 root    root      18800 Aug 12 18:49 mod_gearman_mini_epn*
-rwxr-xr-x 1 root    root     131576 Aug 12 18:49 mod_gearman_worker*
lrwxrwxrwx 1 root    root         27 Feb  2 16:50 mod_gearman_worker-go -> mod_gearman_worker-go-1.1.5*
-rwxr-xr-x 1 default default 8604472 Feb  2 15:56 mod_gearman_worker-go-1.1.5*
-rwxr-xr-x 1 root    root    9027672 Aug 12 18:49 mod_gearman_worker-go.1.2.1*

The "stuck" checks, always seem to have a passive result submitted for them:
image
Now that indicates to me that the problem is something to do with dupserver, I think the main check results and the dupserver results are getting "mixed up" or something and that's causing this behaviour.

Further evidence that the worker is the cause is that the worker.log contains things like

incoming host job: handle: H:<blah>:4126407 - host: <blah> - service:

But when the behaviour is happening, we never see the correllating job: H:<blah>:4126407 finished. Which sort of points towards them going missing or something.

We're going to build 1.3.0 this morning and see if that still shows the behaviour, if it does then we'll start working through the builds to find the breaking release/commit.

debug-result option

Hi,
we are running mod-gearman worker go in omd-labs 2.90 with "debug-result=yes" option, but in output of plugin we are not able to see hostname of the worker.
With old mod-gearman worker debug-result option is working fine.

Thanks,
Jan

Gearman-worker-go crashes/vanishes randomly

Hiya,

We see occasionally (maybe about 10-12 times per year), the gearman-worker-go process just "end" or vanish.

There's nothing in the ~/var/log/mod_gearman/worker.log when it happens, both dmesg and /var/log/syslog are empty, so it doesn't look like it's segfaulting or anything.

Running OMD 4.40 on Debian 10.

worker.conf:

debug=0
logfile=/omd/sites/default/var/log/gearman/worker.log
config=/omd/sites/default/etc/mod-gearman/port.conf
dupserver=partnerserver:4730
eventhandler=yes
notifications=yes
services=yes
hosts=yes
encryption=yes
keyfile=/omd/sites/default/etc/mod-gearman/secret.key
job_timeout=60
min-worker=100
max-worker=500
idle-timeout=30
max-jobs=1000
max-age=0
spawn-rate=100
fork_on_exec=no
load_limit1=0
load_limit5=0
load_limit15=0
show_error_output=yes
enable_embedded_perl=on
use_embedded_perl_implicitly=off
use_perl_cache=on
p1_file=/omd/sites/default/share/mod_gearman/mod_gearman_p1.pl

I think we're probably going to see if we can bump debug up to 1 without filling up the disks etc to see if that will shine any light on what is happening, unless there are any better/other suggestions.

worker available is lost after a while

we are using mod-gearman-worker-go with OMD 2.90 (also with 3.00).
With the command check_gearman -H localhost:4730 -q worker_<hostname> -x we check the worker queues. Some workers lose the connection to their worker queue after some time.
#6 did not help in this case.
When we restart the gearman_worker, it immediately reconnects.
The normal services queues are working as expected

Service check did not exit properly

Hello,

I had to go back to the worker in C, I have some shell plugins returning Service check did not exit properly on the interface.
I changed the plugin to test to:

#!/bin/sh
echo "ok"
exit 0

and the behavior persisted.

I could not pin point why this is happening, any ideas?

Tks.

Very strict command parser to determine execution type (Shell, exec, EPN or internal)

Currently the worker makes use of a very restrictive way of determining if a command should be executed through a shell, exec or EPN.
I noticed this while I was playing around with the check_nwc_health plugin. It supports arguments like this: --warningx ".*usage.*"=$ARG2$ --criticalx ".*usage.*"=$ARG3$. Due to the command contains a *, it will not use EPN but the shell instead.

if strings.ContainsAny(rawCommand, "!$^&*()~[]\\|{};<>?`") {
return parsed
}

Honestly speaking I'm not sure what would be a good solution to resolve this.

gearmand failure on a dupserver breaks check execution

Hello,

Our use-case for the mod-gearman-worker is an HA setup, we have 2 omd instances configured and each's dupserver is set to the other in the worker.cfg. We have a server called omd1 configured to perform active checks and a server called omd2 not configured to perform active checks. It appears that when we've upgraded to omd 2.90 on both we've moved onto using the go worker and this model has fallen apart :( !

We find that if we attempt to run omd1 standalone with dupserver configured against omd2 but gearmand stopped on omd2 then the checks will never complete, the number of jobs running on the services queue in gearman_top escalates and the results are never submitted back to omd1. It would appear that this behaviour is in https://github.com/ConSol/mod-gearman-worker-go/blob/b02a9dc3660ee23cb411afeec19eeda13558ecd7/worker.go#L207 as this loop will retry each dupserver 120 times and wait for 1 second between attempts.

We were planning to downgrade to omd 2.80 and return to the c worker, however we can't find the debian package on the repo anymore :(

Have we missed a way of configuring this to make it work?

Thanks,
Rob

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.