supervisor / superlance Goto Github PK

View Code? Open in Web Editor NEW

447.0 15.0 147.0 292 KB

Superlance utilities for use with the Supervisor process control system

Home Page: http://supervisord.org

License: Other

Python 100.00%

superlance's Introduction

superlance README

Superlance is a package of plugin utilities for monitoring and controlling processes that run under supervisor.

Please see docs/index.rst for complete documentation.

superlance's People

Contributors

Stargazers

Watchers

Forkers

whitelynx jbenet e98cuenc fungusakafungus exfm mfelsche mamico neubloc jkoppe maxnaude frisi ajax-griffin heynemann brandon15811 kushanj1 offlinehacker apopheniac avereha damycra rpermeh rickhanlonii rismalrv daniel15 marlenen clever aebm ezegolub vissible ex5 mathieuduffeler mikluko allenluce krzaczek sopel prgtw djsmith42 jean andreyplotnikov lukasgraf momirza hltbra pdufault upbeatpr msabramo kleinron talset daiguoliangfirst frederikbosch valmire wtracyliu sunchen009 mgalgs stattrak-dragonlore icyfork thebru666 masters77 dynamikdev philiptzou eliga tadasv llui nicolasestrada brouberol reilbert che indeedops stormwind99 hmrbarros adamchainz mrcalc jalavik faheem-nadeem erral zheli guyecode lebinh aspyatkin gugu slara mrdziuban parthkolekar sbraz denisby wxlee aekondratiev kmk-online dpylr sebinthomas rohsa tshchensek cnderrauber randy-ran ksrvtsa happyshi0402 porttitor aandis beaepoch yanjerry9133 gokhanm buchi

superlance's Issues

Directory monitor for conf files

Monitors a directory for file changes and sends an event. Did not see an suitable event type for this.

This will help in restarting processes after changes in the conf.d are detected for applications. More specifically this helps when processes are running within containers and the conf.d is mounted on shared volumes. So when placing files, access to processes in other containers not available due to process namespaces being insulated.

crashmail works with sendmail, fatalmailbatch not working with sendmail

how can i setup to work like fatalmailbatch?

i cannot use open 25 port with no secure for email.
Error sending email: [Errno 111] Connection refused
i get this:

How to install superlance on python 3, when supervisor only supports python 2?

How do I install superlance on python3 when supervisor only supports python 2:

33fe43b4c05a root@~/src $ pip install superlance
Collecting superlance
  Using cached https://files.pythonhosted.org/packages/14/87/d2b4fe1f9e7f97360e75e125cc03b2216a0ce5092034f203febc3818b7da/superlance-1.0.0-py2.py3-none-any.whl
Collecting supervisor (from superlance)
  Using cached https://files.pythonhosted.org/packages/44/60/698e54b4a4a9b956b2d709b4b7b676119c833d811d53ee2500f1b5e96dc3/supervisor-3.3.4.tar.gz
    Complete output from command python setup.py egg_info:
    Supervisor requires Python 2.4 or later but does not work on any version of Python 3.  You are using version 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19)
    [GCC 7.2.0].  Please install using a supported version.

Send process output in email

Is this possible? It doesn't seem like it, but almost everyone could benefit from it.

Sphinx warnings when building from a PyPI tarball

Hi,
When I build the doc from PyPI's 0.13 tarball, I see this:

superlance-0.13/docs/index.rst:44: WARNING: toctree contains reference to nonexisting document u'development'

and

copying static files... WARNING: html_static_path entry u'/var/tmp/portage/dev-python/superlance-0.13/work/superlance-0.13/docs/_static' does not exist

Do you think you could include the development.rst file and the _static directory (or just remove the relevant line from conf.py)?

fatal error in process_state_email_monitor.py

Getting the following error:

bash > /usr/local/bin/fatalmailbatch
Traceback (most recent call last):
  File "superlance/fatalmailbatch.py", line 78, in <module>
    main()
  File "superlance/fatalmailbatch.py", line 74, in main
    fatal = FatalMailBatch.create_from_cmd_line()
  File "/usr/local/lib/python2.7/dist-packages/superlance-0.8-py2.7.egg/superlance/process_state_email_monitor.py", line 77, in create_from_cmd_line
    options = cls.get_cmd_line_options()
 File "/usr/local/lib/python2.7/dist-packages/superlance-0.8-py2.7.egg/superlance/process_state_email_monitor.py", line 73, in get_cmd_line_options
    return cls.validate_cmd_line_options(cls.parse_cmd_line_options())
  File "/usr/local/lib/python2.7/dist-packages/superlance-0.8-py2.7.egg/superlance/process_state_email_monitor.py", line 61, in validate_cmd_line_options
    parser.print_help()
NameError: global name 'parser' is not defined

fatalmailbatch, crashmailbatch & crashsms are all failing for the same reason from command line & inside supervisord.

Running python 2.7.3.

Got it to run using global keyword on the parser variable, but I'm not sure how you want to resolve this.

port it to python 3

❤️

Deprecation warning due to invalid escape sequences in Python 3.8

Deprecation warnings are raised due to invalid escape sequences in Python 3.8 . Below is a log of the warnings raised during compiling all the python files. Using raw strings or escaping them will fix this issue.

find . -iname '*.py'  | xargs -P 4 -I{} python -Walways -m py_compile {}

./superlance/tests/memmon_test.py:313: DeprecationWarning: invalid escape sequence \-
  """Let calc_rss() do its work on a fake process tree:

ERRO pool mylistener2 event buffer overflowed, discarding event

Hi,
I am trying to fix this error I have. I have seen similiar errors but they mainly talk about not using redirect_stderr=True.
For example here:
#55
I have this on my config file

[eventlistener:mylistener2]
command=python3 /etc/supervisor/bin/listener.py
process_name=%(program_name)s_%(process_num)s
numprocs=1
events=PROCESS_STATE
autorestart=true
stderr_logfile=/var/log/supervisor/event-error.log
stdout_logfile=/var/log/supervisor/event.log

And this on my listener

import sys

def write_stdout(s):
    # only eventlistener protocol messages may be sent to stdout
    sys.stdout.write(s)
    sys.stdout.flush()

def write_stderr(s):
    sys.stderr.write(s)
    sys.stderr.flush()

def main():
    while 1:
        # transition from ACKNOWLEDGED to READY
        write_stdout('READY\n')

        # read header line and print it to stderr
        line = sys.stdin.readline()
        write_stderr(line)

        # read event payload and print it to stderr
        headers = dict([ x.split(':') for x in line.split() ])
        data = sys.stdin.read(int(headers['len']))
        write_stderr(data)

        if headers["eventname"] == "PROCESS_STATE_STOPPING":
            write_stderr("Process state stopping...\n")

        # transition from READY to ACKNOWLEDGED
        write_stdout('RESULT 2\nOK')

if __name__ == '__main__':
    main()

I am testing it, and it seems to output Process state stopping... when they stop either way but everytime I reload my supervisor I get
ERRO pool mylistener2 event buffer overflowed, discarding event
on my logs

how to limit email from crashmail

We are running many supervisor tasks. when server facing some problem. crashmail sendout email to configured email id. That's good, but sometime if not able to immediately resolve the server issue, crashmail sends out thousands of email. So Is there any option to limit the email for specific period?

memmon throwing 401 unauthorized when username and password used in unix_http_server

When authentication is used in the unix_http_server section and the same are credentials are used everywhere necessary like supervisorctl, memmon throws 401 unauthorized when it tries to create the ServerProxy in the main method. The supervisorctl is working fine though and the inet_http_server is also setup. What could be the issue?

It seems to expect the SUPERVISOR_USERNAME, SUPERVISOR_PASSWORD and the SUPERVISOR_SERVER_URL to be present in the env. Should we set it explicitly?

-n option is not supported by httpok?

Despite usage help

-n -- optionally specify the name of the httpok process.  This name will
      be used in the email subject to identify which httpok process
      restarted the process.

it seems that httpok doesn't support this option

    short_args="hp:at:c:b:s:m:g:d:eE"
    long_args=[
        "help",
        "program=",
        "any",
        "timeout=",
        "code=",
        "body=",
        "sendmail_program=",
        "email=",
        "gcore=",
        "coredir=",
        "eager",
        "not-eager",
]

Is it mistake or by design?

Doesn't seem to work

Hi. Can't configure memmon to restart express-server when memory limit is exceeded on Ubuntu 18

Fix to #110 not available on pip

It seems that the version which gets installed with pip install superlance is affected by #110. The issue was fixed over 2 years ago but it's still wasting hours of developers' productivity. Could you please release the patched version?

httpok: try to restart even in FATAL process state

I'm wondering why httpok doesn't include an option to always try to restart even if the process is not in ProcessStates.RUNNING?
That would be my feature request for an option to add that (or to do it by default).
Do people use a different tool for that job?

My scenario is that a temporary configuration error meant that a key process couldn't start (startretries was 3, but as the config was incorrect for around a half hour — so I'd have had have set that quite high to cover the half hour)

As I've an hourly httpok running, my assumption was that this would have attempted to restart the process each hour, and (say) 10 hours later the system could be back working without intervention.

I want to see if I'm missing some approach before submitting a PR.

All crash events after the first are ignored when using redirect_stderr on a crashmail listener

In a test with the following program config:

[program:test-crashmail]
command = bash -c 'echo "$(date): TESTING..."; sleep 5; false'
autostart = false
autorestart = false
redirect_stderr = true
stdout_logfile = %(here)s/test-crashmail.log
startsecs=1
startretries = 0

I also have a crashmail listener:

[eventlistener:crashmail]
command=crashmail -a -o '[supervisord] ' -m 'root@localhost'
events=PROCESS_STATE
stdout_logfile = %(here)s/crashmail.log
redirect_stderr = true

I manually started the test-crashmail process (via the web UI) 4 times, leaving plenty of time between each.

In the debug logs of supervisord below, you can see that crashmail prints "unexpected exit, mailing" - supervisor sees this as an "UNKNOWN" state, and it never recovers (even after the mail is sent, and it prints "OKREADY").

No further process state events get sent to crashmail, because it's marked as not ready to receive them.

2013-03-08 09:45:43,112 DEBG fd 20 closed, stopped monitoring <POutputDispatcher at 23797200 for <Subprocess at 22786488 with name test-crashmail in state RUNNING> (stdout)>
2013-03-08 09:45:43,115 INFO exited: test-crashmail (exit status 1; not expected)
2013-03-08 09:45:43,115 DEBG received SIGCLD indicating a child quit
2013-03-08 09:45:43,119 DEBG event 23 sent to listener crashmail
2013-03-08 09:45:43,120 DEBG 'crashmail' stdout output:
unexpected exit, mailing

2013-03-08 09:45:43,122 DEBG crashmail: BUSY -> UNKNOWN (bad result line 'unexpected exit, mailing')
2013-03-08 09:45:43,123 DEBG rebuffering event 23 for pool crashmail (bufsize 0)


2013-03-08 09:46:11,038 INFO spawned: 'test-crashmail' with pid 10052
2013-03-08 09:46:11,059 DEBG 'test-crashmail' stdout output:
Fri Mar  8 09:46:11 EST 2013: TESTING...

2013-03-08 09:46:12,061 INFO success: test-crashmail entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2013-03-08 09:46:16,062 DEBG fd 20 closed, stopped monitoring <POutputDispatcher at 23930712 for <Subprocess at 22786488 with name test-crashmail in state RUNNING> (stdout)>
2013-03-08 09:46:16,065 INFO exited: test-crashmail (exit status 1; not expected)
2013-03-08 09:46:16,066 DEBG received SIGCLD indicating a child quit
2013-03-08 09:46:44,192 DEBG 'crashmail' stdout output:
Mailed:

To: root@localhost
Subject: [supervisord] : test-crashmail crashed at 2013-03-08 09:45:43,117

Process test-crashmail in group test-crashmail exited unexpectedly (pid 10042) from state RUNNINGRESULT 2
OKREADY


2013-03-08 09:47:16,550 INFO spawned: 'test-crashmail' with pid 10059
2013-03-08 09:47:16,569 DEBG 'test-crashmail' stdout output:
Fri Mar  8 09:47:16 EST 2013: TESTING...

2013-03-08 09:47:17,571 INFO success: test-crashmail entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2013-03-08 09:47:21,572 DEBG fd 20 closed, stopped monitoring <POutputDispatcher at 22965856 for <Subprocess at 22786488 with name test-crashmail in state RUNNING> (stdout)>
2013-03-08 09:47:21,574 INFO exited: test-crashmail (exit status 1; not expected)
2013-03-08 09:47:21,575 DEBG received SIGCLD indicating a child quit

2013-03-08 09:48:25,456 INFO spawned: 'test-crashmail' with pid 10063
2013-03-08 09:48:25,472 DEBG 'test-crashmail' stdout output:
Fri Mar  8 09:48:25 EST 2013: TESTING...

2013-03-08 09:48:26,475 INFO success: test-crashmail entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2013-03-08 09:48:30,476 DEBG fd 21 closed, stopped monitoring <POutputDispatcher at 23740856 for <Subprocess at 22786488 with name test-crashmail in state RUNNING> (stdout)>
2013-03-08 09:48:30,478 INFO exited: test-crashmail (exit status 1; not expected)
2013-03-08 09:48:30,479 DEBG received SIGCLD indicating a child quit

After testing this some more, I realise that it's failing because I used redirect_stderr on the crashmail listener. I do this by habit for all of my own programs, since I don't want to have to track two log files. I didn't realise the listener protocol was based on stdout messages, and that redirect_stderr would break it. I guess there are a few ways around this:

don't support redirect_stderr for listeners (I'm assuming it's never a good idea, based on what happened here)
big red warning in the docs to not use redirect_stderr for listeners
allow processes to recover from "UNKNOWN" state once they resume printing expected messages (e.g "OKREADY")

OSError: [Errno 12] Cannot allocate memory

When I was looking for a method to monitor processes (and kill rogue dev processes), I came across memmon from superlance.

I configured it this way:

[group:group]
programs = gunicorn,celerydb,celerycam
priority = 10

[eventlistener:memmon]
command=/.../web/env/bin/memmon -p group=200MB -m <adminemail> 
events=TICK_60
serverurl = unix:///tmp/supervisor.sock
environment=SUPERVISOR_SERVER_URL='unix:///tmp/supervisor.sock',SUPERVISOR_USERNAME=user,SUPERVISOR_PASSWORD=123

Now that works fine apparently, but what happens after a while is this:

raceback (most recent call last):
  File "/.../web/env/bin/memmon", line 9, in <module>
    load_entry_point('superlance==0.7', 'console_scripts', 'memmon')()
  File "/.../web/env/local/lib/python2.7/site-packages/superlance/memmon.py", line 289, in main
    memmon.runforever()
  File "/.../web/env/local/lib/python2.7/site-packages/superlance/memmon.py", line 139, in runforever
    data = shell(self.pscommand % pid)
  File "/.../web/env/local/lib/python2.7/site-packages/superlance/memmon.py", line 80, in shell
    return os.popen(cmd).read()
OSError: [Errno 12] Cannot allocate memory

Isn't that something memmon should use as a reason to kill off the group? (Or do I get it fundamentally wrong?)

Send email through smtp in superlance using crashmail

Hi,

I've posted this question on stackoverflow but given there isn't a lot of questions about superlance, I decided to repost the question here.

I'm trying to set up the email sending when a process changes state in supervisord by using crashmail. Having no luck with the default sendmail program which requires quite a lot of setup, I decided to go with a small script in Python that sends email using SMTP.

This worked very well (I received indeed an email saying that the process state changes) for the first state change but stop working afterward. I have tried to change different options in supervisord such as buffer_size or autorestart but it has no effect.

Here is the script I use to trigger the supervisord state changes:

import time

from datetime import datetime

if __name__ == '__main__':
    print(">>>>> STARTING ...", flush=True)
    while True:
        print("sleep now:", datetime.utcnow(), flush=True)
        time.sleep(30)
        raise Exception("meo meo")

This is the script that sends email through Gmail. This one will send the stdin.

#!/usr/bin/env python

import smtplib


def get_server():
    smtpserver = smtplib.SMTP('smtp.gmail.com:587')
    smtpserver.ehlo()
    smtpserver.starttls()
    smtpserver.login("[email protected]", "password")
    return smtpserver


if __name__ == '__main__':
    import sys

    data = sys.stdin.read()

    s = get_server()
    s.sendmail('[email protected]', ['[email protected]'], data)
    s.quit()

Here is my supervisord.conf

[eventlistener:crashmail]
command=crashmail -a -m [email protected] -s /home/ubuntu/mysendmail.py
events=PROCESS_STATE
buffer_size=102400
autorestart=true

Does anyone have any idea why?
Thanks!

Modify logs to include timestamps

Currently stderr and stdout logs are less than useless if you want to analyze them after a problem has happened because they don't include timestamps and you have no idea when a certain event took place.

Take, for instance the piece of memmon error log I included on a previous issue (#70):

Checking groups app=1610612736
RSS of app:instance1 is 1614974976
Restarting app:instance1
RSS of app:instance2 is 1297457152
RSS of app:instance3 is 1477554176
Checking groups app=1610612736
RSS of app:instance1 is 318668800
RSS of app:instance2 is 1297506304
RSS of app:instance3 is 1477554176
Checking groups app=1610612736
RSS of app:instance1 is 164720640
RSS of app:instance2 is 1297575936
RSS of app:instance3 is 1477672960
Checking groups app=1610612736
RSS of app:instance1 is 340303872
RSS of app:instance2 is 1305280512
RSS of app:instance3 is 1477713920
Checking groups app=1610612736
RSS of app:instance1 is 166830080
RSS of app:instance2 is 1318711296
RSS of app:instance3 is 1477849088
Checking groups app=1610612736
RSS of app:instance1 is 337248256
RSS of app:instance2 is 1325903872
RSS of app:instance3 is 1477685248

The httpok error log of that process gives me no indication of when the restarts happened, so I can not associate this information with the other:

Restarting selected processes ['app:instance1']
app:instance1 is in RUNNING state, restarting
app:instance1 restarted
Restarting selected processes ['app:instance1']
app:instance1 is in RUNNING state, restarting
app:instance1 restarted
Restarting selected processes ['app:instance1']
app:instance1 is in RUNNING state, restarting
app:instance1 restarted

python 3 installtion

Hey - thanks for maintaining this!

How do I install this under python 3? I tried pip:

Vs-Pro.local vgoklani@~ $ pip install superlance
Collecting superlance
  Using cached https://files.pythonhosted.org/packages/14/87/d2b4fe1f9e7f97360e75e125cc03b2216a0ce5092034f203febc3818b7da/superlance-1.0.0-py2.py3-none-any.whl
Collecting supervisor (from superlance)
  Using cached https://files.pythonhosted.org/packages/44/60/698e54b4a4a9b956b2d709b4b7b676119c833d811d53ee2500f1b5e96dc3/supervisor-3.3.4.tar.gz
    Complete output from command python setup.py egg_info:
    Supervisor requires Python 2.4 or later but does not work on any version of Python 3.  You are using version 3.7.0 (default, Jun 28 2018, 07:39:16)
    [Clang 4.0.1 (tags/RELEASE_401/final)].  Please install using a supported version.

I've been using supervisor with python3 and it's very stable :) but I need crashmail. Thanks!

fatalmailbatch not working getting FATAL can't find command 'fatalmailbatch'

Hello,
I have installed the superlance package on python but when I run the supervisor on MAC OS. I got the error
FATAL can't find command 'fatalmailbatch' Please check the error. the supervisor is installed and working properly but fatalmailbatch event not working and I got the error. Please help how we can fix this issue.

Also, during the installation, I got this warning.

WARNING: The scripts echo_supervisord_conf, pidproxy, supervisorctl and supervisord are installed in
'/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.9/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.

WARNING: The scripts crashmail, crashmailbatch, crashsms, fatalmailbatch, httpok and memmon are installed in
'/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.9/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Thanks

Installing super lance on alpine based docker images

I am trying to set up a memmon event listener to monitor the process memory and restart the high memory consuming process inside one of our containers whose base image is alpine. I did install the pip latest version and installed the superlance with pip. I update my supervisord config with the event listener. however when I look at the supervisorctl status, the memmon process get indefinite STARTING loop. I stopped the memmon from supervisorctl and started it manually through command line inside the container. i get the below error:

Traceback (most recent call last):
File "/usr/bin/memmon", line 8, in
sys.exit(main())
File "/usr/lib/python2.7/site-packages/superlance/memmon.py", line 417, in main
memmon.rpc = childutils.getRPCInterface(os.environ)
File "/usr/lib/python2.7/site-packages/supervisor/childutils.py", line 17, in getRPCInterface
return xmlrpclib.ServerProxy('http://127.0.0.1', getRPCTransport(env))
File "/usr/lib/python2.7/site-packages/supervisor/childutils.py", line 11, in getRPCTransport
return SupervisorTransport(u, p, env['SUPERVISOR_SERVER_URL'])
File "/usr/lib/python2.7/UserDict.py", line 40, in getitem
raise KeyError(key)
KeyError: 'SUPERVISOR_SERVER_URL

Can anyone help me understand this error?

Below is the error from Supervisor logs:

Traceback (most recent call last):
File "/usr/bin/memmon", line 11, in
sys.exit(main())
File "/usr/lib/python2.7/site-packages/superlance/memmon.py", line 418, in main
memmon.runforever()
File "/usr/lib/python2.7/site-packages/superlance/memmon.py", line 152, in runforever
infos = self.rpc.supervisor.getAllProcessInfo()
File "/usr/lib/python2.7/xmlrpclib.py", line 1243, in call
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1602, in __request
verbose=self.__verbose
File "/usr/lib/python2.7/site-packages/supervisor/xmlrpc.py", line 519, in request
'' )
xmlrpclib.ProtocolError: <ProtocolError for 127.0.0.1/RPC2: 401 Unauthorized>

I also see that memmon start up also breaks my supervisorctl web interface on [inet_http_server] when i do curl -u username:password http://localhost:9001 I get 500 error.

Any help is appreciated!

*mailbatch (both crash and fatal)

parse_cmd_line_options
is somehow NOT called before
validate_cmd_line_options

Traceback (most recent call last):
File "/usr/local/bin/fatalmailbatch", line 9, in
load_entry_point('superlance==0.8', 'console_scripts', 'fatalmailbatch')()
File "/usr/local/lib/python2.7/dist-packages/superlance/fatalmailbatch.py", line 74, in main
fatal = FatalMailBatch.create_from_cmd_line()
File "/usr/local/lib/python2.7/dist-packages/superlance/process_state_email_monitor.py", line 77, in create_from_cmd_line
options = cls.get_cmd_line_options()
File "/usr/local/lib/python2.7/dist-packages/superlance/process_state_email_monitor.py", line 73, in get_cmd_line_options
return cls.validate_cmd_line_options(cls.parse_cmd_line_options())
File "/usr/local/lib/python2.7/dist-packages/superlance/process_state_email_monitor.py", line 61, in validate_cmd_line_options
parser.print_help()
NameError: global name 'parser' is not defined

memmon and program with multiple processes?

I have a program that runs with multiple processes like this in my supervisord.conf

[program:firehose]
process_name=%(program_name)s_%(process_num)02d
command=php artisan doctrine:queue:work beanstalkd --queue=firehose --tries=5 --sleep=5 --delay=0 --daemon
directory=/var/app/current/
autostart=true
autorestart=true
numprocs=5

looks like this in supervisorctl status

firehose:firehose_00                                       RUNNING   pid 26265, uptime 0:13:09
firehose:firehose_01                                       RUNNING   pid 26264, uptime 0:13:09
firehose:firehose_02                                       RUNNING   pid 26267, uptime 0:13:09
firehose:firehose_03                                       RUNNING   pid 26266, uptime 0:13:09
firehose:firehose_04                                       RUNNING   pid 26263, uptime 0:13:09

I have successfully caught these using memmon -a but I can't get memmon to monitor just the program (all 5 processes). I have tried

-p firehose=100MB
-g firehose=100MB
-p firehose:firehose_00=100MB <-- example trying to monitor just one

but none of these have worked. What am I missing here?

Please look at this question

hello:
I use supervision to monitor four projects, and superlance was used in the configuration file to monitor one of the projects and send emails when it unexpectedly exits. But now, every project that exits accidentally will send an email.Can you help me?

Add docs to readme

It would be helpful for newcommers like me to find the link to the documentation of superlance (https://superlance.readthedocs.io/en/latest/) in the readme at github and pypi :)

Please look at this question

how to test crashsms

Hi,

I installed superlance and configured crashsms. crashsms is running:
crashsms RUNNING pid 21683, uptime 0:10:06

This what I added in supervisord.conf:

[eventlistener:crashsms]
command=crashsms --toEmail="test@email_to_sms_gateway.com" --subject="Testing" --smtpHost="smtp.mailgun.org" --userName="[email protected]" --password="xxxxxxx" --fromEmail="[email protected]"
events=PROCESS_STATE

So far no E-mail has been sent. How can I manually cause a PROCESS_STATE change so I can test it out ?

I stopped one of my processes with: sudo supervisorctl stop worker-1000, but that didn't trigger the event.

Any guidance will be appreciated. Thanks.

memory usage of everything in process tree

Right now memmon uses ps, which only gives you the memory usage of the top-level pid in the process tree. This isn't very accurate if you use a run script or the process being monitored spawns child processes.

Memmon: socket.error: [Errno 13] Permission denied

Memmon in unable to start, repeated error in memmon-stderr---supervisor-IgIsou.log:

Checking programs yii-queue-worker=1073741824
Traceback (most recent call last):
File "/usr/local/bin/memmon", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/superlance/memmon.py", line 418, in main
memmon.runforever()
File "/usr/local/lib/python2.7/dist-packages/superlance/memmon.py", line 152, in runforever
infos = self.rpc.supervisor.getAllProcessInfo()
File "/usr/lib/python2.7/xmlrpclib.py", line 1243, in call
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1602, in __request
verbose=self.__verbose
File "/usr/lib/python2.7/dist-packages/supervisor/xmlrpc.py", line 509, in request
self.connection.request('POST', handler, request_body, self.headers)
File "/usr/lib/python2.7/httplib.py", line 1042, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1082, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 1038, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 882, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 844, in send
self.connect()
File "/usr/lib/python2.7/dist-packages/supervisor/xmlrpc.py", line 530, in connect
self.sock.connect(self.socketfile)
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 13] Permission denied

Supervisor version 3.3.1, Debian 9.

Allow customizing email body messages

I would like the ability to customize the body of email messages that get sent from plugins like crashmail and memmon. Could a switch be added for this to those programs?

I can submit a pull request if this is approved.

Thanks!

email address from file [Feature Request]

We have the crashmail email go to different addresses based on an on-call rota, and at the moment we have to bounce it weekly to pick up the change. It would be nice to have it read the email address from a file just before sending instead. I'm not good with Python, but something like...

def mail(self, email, subject, msg):
if os.path.isfile(self.email) and os.access(self.email, os.R_OK):
with open(self.email,'r') as f:
body = 'To: %s\n' % f.read()
else:
body = 'To: %s\n' % self.email

Create a new release with Python 3 support

Hi,
While it looks like the package supports Python 3, only the git version does, none of the releases do (and there hasn't been one since 2016).

Could you please release a new version?

Crashmail event buffer overflowed,

supervisor: 3.2.0
superlace: 1.0.0

configuration
[eventlistener:crashmail]
command=/usr/local/bin/crashmail -a -m [email protected] "Server %(host_node_name)s"
events=PROCESS_STATE_EXITED

Crashmail works ok for me but syslog keeps writing this message:

ERRO pool crashmail event buffer overflowed

I tried changing different stuff like buffer_size etc but non of them worked

Any idea?
Thanks!

Monitoring programs specified with group

Hello, I believe I've found a bug in memmon plugin.
For a program argument in memmon call, it's said that it's possible to specify group name to avoid ambiguity, with the format group_name:program_name. However, the documentation placed in the code suggests otherwise - process_name:group_name. At last I figured out that the first variant is the correct one.

When I tried to use memmon plugin for my program, I noticed that the following exception had been emmitted to the eventlistener log:

Checking programs app:backend-0=83886080
RSS of app:backend-0 is 79663104
Traceback (most recent call last):
  File "/home/vagrant/test-superlance/bin/memmon", line 11, in <module>
    sys.exit(main())
  File "/home/vagrant/test-superlance/lib/python2.7/site-packages/superlance/memmon.py", line 402, in main
    memmon.runforever()
  File "/home/vagrant/test-superlance/lib/python2.7/site-packages/superlance/memmon.py", line 169, in runforever
    if  rss > self.programs[name]:
KeyError: 'backend-0'

As far as I've understood, this is the line of code which throws an error because the program is being looked up without group name.

To reproduce the issue, I've written an extra test:

def test_runforever_tick_program_with_group(self):
    programs = {'foo:foo': 0 }
    groups = {}
    _any = None
    memmon = self._makeOnePopulated(programs, groups, _any)
    memmon.stdin.write('eventname:TICK len:0\n')
    memmon.stdin.seek(0)
    memmon.runforever(test=True)
    lines = memmon.stderr.getvalue().split('\n')
    self.assertEqual(len(lines), 4)
    self.assertEqual(lines[0], 'Checking programs foo:foo=0')
    self.assertEqual(lines[1], 'RSS of foo:foo is 2264064')
    self.assertEqual(lines[2], 'Restarting foo:foo')
    self.assertEqual(lines[3], '')

The fix is quite simple, I can make a pull request, if you like.

superlance crashmail doesn't work

Hello,
I tried installing superlance and running crashmail like this:

sudo apt-get install python-pip
sudo pip install superlance

after i do:

sudo nano /etc/supervisor/supervisord.conf

and after i added:

[eventlistener:crashmail]
command=/usr/local/bin/crashmail -a -m [email protected]
events=PROCESS_STATE

and I do not receive anything....

My fichier crashmail is :

#!/usr/bin/python

-- coding: utf-8 --

import re
import sys

from superlance.crashmail import main

if name == 'main':
sys.argv[0] = re.sub(r'(-script.pyw?|.exe)?$', '', sys.argv[0])
sys.exit(main())

Can you help me please ?

Thanks

Best regards

Ben

Exit 0 on `--help`

cmd --help should not return a non-zero exit status, for the various commands installed by superlance.

crashmail sending infinite emails on PROCESS_STATE_EXITED

Help!

I stopped all supervisord my processes, obviously the one is causing it. I am still getting crashmail emails with DIFFERENT pid everytime.

OS X: ERRO pool memmon event buffer overflowed, discarding event 73

At start I get two messages, and after I'm got this errors:

2014-11-16 23:44:50,829 ERRO pool memmon event buffer overflowed, discarding event 62
2014-11-16 23:44:55,838 ERRO pool memmon event buffer overflowed, discarding event 63
2014-11-16 23:45:00,843 ERRO pool memmon event buffer overflowed, discarding event 64
2014-11-16 23:45:05,849 ERRO pool memmon event buffer overflowed, discarding event 65
2014-11-16 23:45:10,856 ERRO pool memmon event buffer overflowed, discarding event 66
2014-11-16 23:45:15,861 ERRO pool memmon event buffer overflowed, discarding event 67
2014-11-16 23:45:20,867 ERRO pool memmon event buffer overflowed, discarding event 68
2014-11-16 23:45:25,874 ERRO pool memmon event buffer overflowed, discarding event 69
2014-11-16 23:45:30,663 ERRO pool memmon event buffer overflowed, discarding event 70
2014-11-16 23:45:35,668 ERRO pool memmon event buffer overflowed, discarding event 71
2014-11-16 23:45:40,377 ERRO pool memmon event buffer overflowed, discarding event 72
2014-11-16 23:45:45,418 ERRO pool memmon event buffer overflowed, discarding event 73

settings:

[eventlistener:memmon]
command=memmon -p events=1MB
events=TICK_5,PROCESS_STATE
redirect_stderr=True
stdout_logfile=/tmp/memmon.log

httpok does not seem to be able to restart fcgi-program

It seems that httpok cannot restart whole group defined as fcgi-program

Say, if I have defined it like this:
[fcgi-program:test-fcgi]
...

Then "httpok -p test-fcgi:*" or "httpok -p test-fcgi" do not seem to restart the group.
But if I list all fast-cgi processes one by one:
-p test-fcgi:test-fcgi_20 -p test-fcgi:test-fcgi_21 and so on - then it seems to work as expected

it would be nice if httok could restart whole group/fast-cgi as at once.

httpok restarts slow process

I have a Plone site with 3 Zope busy instances running under Supervisord; I have memmon and httpok plugins configured but seems that sometimes httpok restarts an instance that was recently restarted by memmon.

here is a piece of my memmon stderr log:

Checking groups app=1610612736
RSS of app:instance1 is 1614974976
Restarting app:instance1           <-- restarting
RSS of app:instance2 is 1297457152
RSS of app:instance3 is 1477554176
Checking groups app=1610612736
RSS of app:instance1 is 318668800  <-- going up
RSS of app:instance2 is 1297506304
RSS of app:instance3 is 1477554176
Checking groups app=1610612736
RSS of app:instance1 is 164720640  <-- down
RSS of app:instance2 is 1297575936
RSS of app:instance3 is 1477672960
Checking groups app=1610612736
RSS of app:instance1 is 340303872  <-- going up
RSS of app:instance2 is 1305280512
RSS of app:instance3 is 1477713920
Checking groups app=1610612736
RSS of app:instance1 is 166830080  <-- down
RSS of app:instance2 is 1318711296
RSS of app:instance3 is 1477849088
Checking groups app=1610612736
RSS of app:instance1 is 337248256  <-- going up
RSS of app:instance2 is 1325903872
RSS of app:instance3 is 1477685248
Checking groups app=1610612736
RSS of app:instance1 is 432963584  <-- stabilized
RSS of app:instance2 is 1325481984
RSS of app:instance3 is 1477685248
Checking groups app=1610612736
RSS of app:instance1 is 630874112
RSS of app:instance2 is 1325424640
RSS of app:instance3 is 1477685248

as you can see, instance1 RSS goes from around 300MB to 150MB a couple of times before stabilizing; this seems to me as an indicator of httpok restarting the instance in the middle. both plugins are configured running at TICK_60.

I solved this on the initial start by waiting 10 minutes before start using httpok; probably we need another parameter to deal with that after the process is running.

crashmail.py ignores -a and -p options

The CrashMail class sets self.programs and self.any but never reads them. As a result, crashmail.py always behaves as though -a is specified.

httpok doesnt work if timeout is < 10

This sounds unlikely but seems to be the case:

This commit: 0e6fb2d#diff-d3d6eafafd2e31ce38ebab9e08f156eaL180

changed the logic for the retry loop. retry_time is hardcoded to 10 in httpok, so if timeout is < 10,
range(self.timeout // (self.retry_time or 1) - 1 , -1, -1):
becomes
range(5 // 10 - 1 , -1, -1):
simplifies to
range(-1 , -1, -1):
which will never execute because it is [].

I attempted to write a test case but I'm not really skilled enough to dissect the http_ok test file.

Alignment of the crashmailbatch and crashmail mail options

The crashmailbatch supports sending emails via smpt options, but not via sendmail. On the crashmail it's the other way around. I'd like to use crashmailbatch with sendmail, but this is currently not supported.
It would be nice, if all the commands extend 'ProcessStateEmailMonitor' and 'ProcessStateEmailMonitor' supports a sendmail as default. The behavior can be overwritten by the open parameters --smtp-*.

Let httpok send more signals [feature request]

I would like httpok to send SIGUSR1 before killing a Zope instance, to obtain a thread dump, so that I can figure out what was keeping the instance busy.

What would be a good way to specify this in configuration?
Would something like the following make sense?

diff --git a/superlance/httpok.py b/superlance/httpok.py
index 23682fc..3e7c6b1 100644
--- a/superlance/httpok.py
+++ b/superlance/httpok.py
@@ -280,6 +280,8 @@ class HTTPOk:
                         namespec, m.read()))
             write('%s is in RUNNING state, restarting' % namespec)
             try:
+                for signal in self.signals:
+                    self.rpc.supervisor.signalProcess(namespec, signal)
                 self.rpc.supervisor.stopProcess(namespec)
             except xmlrpclib.Fault as e:
                 write('Failed to stop process %s: %s' % (

Raised at https://lists.supervisord.org/pipermail/supervisor-users/2014-September/001520.html too.

(signalProcess is not yet in a supervisor release.)

New release?

Hi,

master contains a fix that would have prevented downtimes for us twice by now. Can we have a new release? I'd also be happy to do it. My pypi username is do3cc.

Add hooks for memmon to run a command before and after restarting a process

We have the following typical use case: Plone site with multiple instances running behind a web accelerator like Varnish, using memmon and httpok.

From time to time, memmon will need to restart an instance because of high memory consumption; Varnish has configured backend health check probes that test if the instances are available or not, and this probes have to play very nicely with the instances because requests hitting the backend are typically slow; for instance:

probe healthcheck {
    .interval = 10s; .request = "HEAD / HTTP/1.1"; .timeout = 3s;
}

As restarting a Plone instance is a time consuming process (typically, around 30 seconds), Varnish will not notice the instance is down for some time and will continue sending requests; the instance then came in and will be flooded with a lot of pending requests from Varnish.

The Varnish backend then behaves erratically for some time until the instance stabilizes.

What I would like to have? a hook to run a command before the instance is restarted and after is marked as running.

Then I would be able to configure something like this:

memmon detects high memory usage
pre-hook is run: varnishadm backend.set_health instance1 sick
instance1 is restarted
supervisor waits and marks it as running
post-hook is run: varnishadm backend.set_health instance1 auto

This could be useful in other use cases also.

Superlance to send alerts to Slack channel

Hello!

I would like to extend this application to handle alerts to slack channels. Is that fine?

memmon - supervisor rpc client sometimes throws exception on start

Because the restart is not an atomic operation around .001% of the time after memmon stops a process it throws an exception when trying to start it. This results both in memmon crashing and the process being restarted being left in a stopped state.

The error is often a broken pipe, but is sometimes: httplib.IncompleteRead: IncompleteRead(32644 bytes read, 35829 more expected) which I believe means that
memmon.rpc = childutils.getRPCInterface(os.environ)
will need to be executed and then start to be retried.

xmlrpclib.Fault UNKNOWN_METHOD

Using: supervisor 3.1.1, memmon latest from git

The following configuration

[inet_http_server]
# Required for memmon
port = 127.0.0.1:9001


[supervisord]
nodaemon=true


# [program:appserver]
# ...
# ...


# See http://superlance.readthedocs.org/en/latest/memmon.html Workaround for
# dealing with application server memory consumption.  Restart 'appserver'
# program (see above) when it reaches a dangerous thresshold.
[eventlistener:memmon]
command=memmon -p appserver=1400MB
events=TICK_60
redirect_stderr=true
stdout_logfile=%(ENV_AS_LOGDIR)s/mmstdout
stderr_logfile=%(ENV_AS_LOGDIR)s/mmstderr

causes memmon to fail (stack trace and stdout below)

Checking programs appserver=1468006400
Traceback (most recent call last):
  File "/usr/local/bin/memmon", line 9, in <module>
    load_entry_point('superlance==0.11', 'console_scripts', 'memmon')()
  File "/usr/local/lib/python2.7/dist-packages/superlance/memmon.py", line 402, in main
    memmon.runforever()
  File "/usr/local/lib/python2.7/dist-packages/superlance/memmon.py", line 147, in runforever
    infos = self.rpc.supervisor.getAllProcessInfo()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1224, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1578, in __request
    verbose=self.__verbose
  File "/usr/local/lib/python2.7/dist-packages/supervisor/xmlrpc.py", line 475, in request
    return u.close()
  File "/usr/lib/python2.7/xmlrpclib.py", line 793, in close
    raise Fault(**self._stack[0])
xmlrpclib.Fault: <Fault 1: 'UNKNOWN_METHOD'>