GithubHelp home page GithubHelp logo

catalyst / moodle-tool_heartbeat Goto Github PK

View Code? Open in Web Editor NEW
23.0 23.0 28.0 254 KB

Moodle health checks for load balancers / nagios

Home Page: https://moodle.org/plugins/tool_heartbeat

Makefile 0.14% PHP 98.92% Perl 0.28% Mustache 0.65%
moodle-plugin nagios cron icinga icinga-plugin nagios-plugin heartbeat cli

moodle-tool_heartbeat's Introduction

GitHub Workflow Status (branch)

A heartbeat test page for Moodle

What is this?

This plugin exposes various endpoints that can be wired to load balancers and monitoring systems to help expose when things go wrong.

NOTE: In an ideal world this plugin should be redundant and most of it's functionality built into core as a new API, enabling each plugin to delare it's own extra health checks. See:

https://tracker.moodle.org/browse/MDL-47271

Branches

Branch Moodle version PHP Version
master Moodle 2.7 - 4.1 Php 5.4.4+
MOODLE_39_STABLE Moodle 3.9 + Php 7.2+

The master branch retains very deep support for old Totara's and Moodle's back to Moodle 2.7.

For any site using Moodle 3.9 or later, it is recommended to use the MOODLE_39_STABLE branch.

The MOODLE_39_STABLE branch uses the Check API exclusively, which simplifies the code massively.

Versioning

Versioning follows the Moodle versioning guidelines

Whenever a version change is required:

  • The master branch should always be 20231024xx where xx increases by 1 each time.
  • The MOODLE_39_STABLE branch should always be updated to the current date.

Front end health

This is the index.php check, and is designed to only assert that the front end is healthy and was intended for use as a load balancer test.

eg it might chech the connection to the filesystem, but not stress too much about the health of the filesystem itself. The reason for this is that front end health checks that fail for the wrong reasons pull nodes from the load balancer for no reason.

http://moodle.example.com/admin/tool/heartbeat/

It will return a page with either a 200 or 503 response code and if it fails a string for why.

By default it only performs a light check, in particular it does not check the moodle database. To do a full check add this query param:

http://moodle.example.com/admin/tool/heartbeat/?fullcheck

This check can also be run as a CLI:

php index.php fullcheck

Example return values for heartbeat

Example for when the server is healthy.

(HTTP 200)
Server is ALIVE
sitedata OK

Example for when the server is in command line maintenace mode.

(HTTP 200)
Server is in MAINTENANCE
sitedata OK

Example for when the server is not healthy.

(HTTP 503)
Server is DOWN
Failed: database error

Application health

Named croncheck.php for compatibility with older versions of this plugin, this page executes all status check API checks, and shows any that return non-ok results.

It is a nagios compliant checker to see if cron or any individual tasks are failing, with configurable thresholds

This script can be either run from the web:

http://moodle.example.com/admin/tool/heartbeat/croncheck.php

Or can be run as a CLI in which case it will return in the format expected by Nagios:

sudo -u www-data php /var/www/moodle/admin/tool/heartbeat/croncheck.php

Failed login detection

The script loginchecker is a nagios compliant checker to monitor the number of failed login attempts on a Moodle site as a security intrusion detection mechanic, with configurable thresholds.

This script can be either run from the web:

http://moodle.example.com/admin/tool/heartbeat/loginchecker.php

Or can be run as a CLI in which case it will return in the format expected by Nagios:

sudo -u www-data php /var/www/moodle/admin/tool/heartbeat/loginchecker.php

The various thresholds can be configured with query params or cli args see this for details:

php loginchecker.php -h

Installation

Best to always use the latest version from this git repo:

https://github.com/catalyst/moodle-tool_heartbeat

Or via the Moodle plugin directory (which may be out of date)

https://moodle.org/plugins/view/tool_heartbeat

Configuration

http://moodle.local/admin/settings.php?section=tool_heartbeat

  • Set a fake warning state of 'error' or 'warn'
  • By default in a new install this is set to 'error'. This is done intertionally so that you know your monitoring is wired up correctly end to end. You should see you monitoring raise an alert which tells you that it is a test and links to the admin setting to turn it into normal monitoring mode.
  • Optionaly lock down the endpoints by IP

Testing

When you first setup this plugin and have wired it end to end with Nagios / Icinga or another monitoring tool, you want the peace of mind to know that it is all correctly working. There is a setting which allows you to send a fake warning so you can confirm your pager will go off. This setting is set to 'error' by default by design

http://moodle.local/admin/settings.php?section=tool_heartbeat

moodle-tool_heartbeat's People

Contributors

azrek avatar bradpasley avatar brendanheywood avatar bwalkerl avatar daledavies avatar golenkovm avatar hdagheda avatar jaypha avatar jwalits avatar keevan avatar matthewhilton avatar mattrice avatar ned300889 avatar nhoobin avatar olive007 avatar patkira avatar pauldamiani avatar peterburnett avatar rhell4 avatar roperto avatar sarahjcotton avatar sebastianberm avatar tuanngocnguyen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

moodle-tool_heartbeat's Issues

Use of undefined constant CLI_SCRIPT

PHP Notice: Use of undefined constant CLI_SCRIPT - assumed 'CLI_SCRIPT' in tool/heartbeat/index.php on line 47

if (!defined(CLI_SCRIPT)) should probably be if (!defined('CLI_SCRIPT'))

Version 2019031900

Execution fails if the script croncheck.php is not launched from the folder

When executing the script croncheck.php from any folder other than the one where the plugin is saved, the execution fails:

PHP Warning:  require(../../../config.php): failed to open stream: No such file or directory in /opt/moodle/admin/tool/heartbeat/croncheck.php on line 42
PHP Fatal error:  require(): Failed opening required '../../../config.php' (include_path='.:/usr/share/pear:/usr/share/php') in /opt/moodle/admin/tool/heartbeat/croncheck.php on line 42

File permisisons causing unit test to fail

The following unit test was failing in Totara due to file permissions:

  1. totara_core_totara_testcase::test_file_bitmask
    admin/tool/heartbeat/index.php is not correctly bitmasked, it is using 755
    admin/tool/heartbeat/iplock.php is not correctly bitmasked, it is using 755
    admin/tool/heartbeat/croncheck.php is not correctly bitmasked, it is using 755

/var/www/clients/catalyst/totara/totara/core/tests/totara_test.php:63
/var/www/clients/catalyst/totara/lib/phpunit/classes/base_testcase.php:600
/var/www/clients/catalyst/totara/lib/phpunit/classes/advanced_testcase.php:68

To re-run:
vendor/bin/phpunit totara_core_totara_testcase totara/core/tests/totara_test.php

Fix:
chmod 644 admin/tool/heartbeat/index.php admin/tool/heartbeat/iplock.php admin/tool/heartbeat/croncheck.php

Configurable check around when specific tasks were run

Certain clients want like a mini SLA around how often a specific task takes to run, and at what point in time:

  • for a given task we want to alert if it has take longer than X to run
  • for a given task we want to alert if it hasn't run longer than X minute past when it was supposed to (eg contention with other tasks)
  • if a task hasn't run at all in X

Session handler checking for memcached uses memcache functions.

When checking the status of the session handler, it is hard coded to use '\core\session\memcached'.

$sessionhandler = (property_exists($CFG, 'session_handler_class') && $CFG->session_handler_class == '\core\session\memcached');

During the actual check, it uses a function 'memcache_connect()' which belongs to the php module 'memcache'.

memcache_connect($memcache[0], $memcache[1], 3);

If the server is not using the memcache extension a fatal error will be thrown.

Additionally the Moodle error handling has changed with PHP7 https://docs.moodle.org/dev/Moodle_and_PHP7#Exception_and_Throwable and diagnosing that there is an issue with the previous code will return a HTTP status code 500.

Warning and notices

/report/status/index.php

Notice: Undefined offset: 1 in /var/www/site/admin/tool/heartbeat/classes/check/tasklatencycheck.php on line 57

Notice: Undefined offset: 2 in /var/www/site/admin/tool/heartbeat/classes/check/tasklatencycheck.php on line 57

Notice: Undefined offset: 3 in /var/www/sitetool/heartbeat/classes/check/tasklatencycheck.php on line 57
Invalid get_string() identifier: 'taskconfigbad' or component 'tool_heartbeat'. Perhaps you are missing $string['taskconfigbad'] = ''; in /var/www/site/admin/tool/heartbeat/lang/en/tool_heartbeat.php?

    line 353 of /lib/classes/string_manager_standard.php: call to debugging()
    line 7413 of /lib/moodlelib.php: call to core_string_manager_standard->get_string()
    line 69 of /admin/tool/heartbeat/classes/check/tasklatencycheck.php: call to get_string()
    line 111 of /lib/classes/check/table.php: call to tool_heartbeat\check\tasklatencycheck->get_result()
    line 44 of /report/status/index.php: call to core\check\table->render()

Invalid get_string() identifier: 'checktasklatencycheck' or component 'tool_heartbeat'. Perhaps you are missing $string['checktasklatencycheck'] = ''; in /var/www/site/admin/tool/heartbeat/lang/en/tool_heartbeat.php?

    line 353 of /lib/classes/string_manager_standard.php: call to debugging()
    line 7413 of /lib/moodlelib.php: call to core_string_manager_standard->get_string()
    line 98 of /lib/classes/check/check.php: call to get_string()
    line 119 of /lib/classes/check/table.php: call to core\check\check->get_name()
    line 44 of /report/status/index.php: call to core\check\table->render()

syntax error, unexpected end of file, expecting variable

Hi, was testing this out and got this:

3.11 php74

Parse error: syntax error, unexpected end of file, expecting variable (T_VARIABLE) or ${ (T_DOLLAR_OPEN_CURLY_BRACES) or {$ (T_CURLY_OPEN) in .. /admin/tool/heartbeat/errors/compile.php on line 39
Errors parsing .. /admin/tool/heartbeat/errors/compile.php

Make a little standalone progress bar page test

we often have issues with buffers somewhere, php, gzip, apache, varnish, nginx can all conspire to hold onto content that should be flushed with an explicit flush. So make a super simple page which makes a progress bar and behind it is just a count to 5 with a sleep every second. This will at least demonstrate the issue in it's simplest case. We could also go the extra step and try and diagnose which layer it is by either exposing some of the config it can see, or maybe some other dark magic

<script type="text/javascript">
//<![CDATA[
updateProgressBar("pbar_588aa21705eac", 97.13, "", "2.41 secs");

//]]>
</script>
<script type="text/javascript">
//<![CDATA[
updateProgressBar("pbar_588aa21705eac", 97.15, "", "2.39 secs");

//]]>
</script>

Moodle calls flush() which is only it's main buffer

Places to check:

ob_implicit_flush(true);

http://httpd.apache.org/docs/current/mod/mod_deflate.html#deflatebuffersize

Make heartbeat periodically ping the error log

We want the heartbeat to make a small amount of noise in the error logs to help test that the access cron and error logs are always working. Something like once every 30 mins

Because this is a cron we probably need it to curl another endpoint which does the error_log

Doesn't work for Totara 2.6

!!! Plugin "tool_heartbeat" (2016111001) could not be installed. It requires a newer version of Moodle (currently you are using 2013111811, you need 2014050800). !!!
11:28:31.670 !!
11:28:31.671 Error code: pluginrequirementsnotmet !!
11:28:31.671 !! Stack trace: * line 461 of /lib/upgradelib.php: upgrade_requires_exception thrown
11:28:31.671 * line 1608 of /lib/upgradelib.php: call to upgrade_plugins()
11:28:31.671 * line 492 of /lib/installlib.php: call to upgrade_noncore()
11:28:31.671 * line 409 of /lib/phpunit/classes/util.php: call to install_cli_database()
11:28:31.671 * line 153 of /admin/tool/phpunit/cli/util.php: call to phpunit_util::install_site()
11:28:31.671 !!
11:28:31.672

Allow ignoring the "forcelogin" config option

Currently we're using Moodle for a project, however we have a requirement to force anyone to login before they can access the site. This causes our heartbeat to fail however, because this plugin uses the "require_login()" method which attempts to redirect to the login page, however a redirect is marked as a failure by the AWS health checks.

It would be excellent if we could have a configuration option to turn off the require_login check, to support more use cases.

303 Error

I'm using this plugin as a health check for my load balancer. However, my load balancer is only getting 303 errors and getting redirected to the URL for the Moodle site I'm using. How can I stop that from happening?

"IP Blocking Configuration" not working as expected

The setting "IP Blocking Configuration" says that it is a list of "safe IP addresses for the heartbeat to respond to"... however, this setting is only used in iplock.php, and that file is not included nor the function validate_ip_against_config() used anywhere else.

How does the "IP Blocking Configuration" work to restrict the IPs that the endpoints like index.php, adminloginchecker.php, compresscheck.php, buffercheck.php, compresscheck.php, notbuffered.php, progress.php, sessionone.php, sessiontwo.php, and upload.php?

It seems that all these endpoints can be called by anyone, even non-logged-in users.

Make a test check page which we can use to confirm compression and buffering it on

This is a sibling check to the progress check

https://github.com/catalyst/moodle-tool_heartbeat/blob/master/progress.php

The problem this is designed to check is where people may have misconfigured their environment to make the buffering problem go away, but in doing so have turned it off across the board.

This script should be extremely simple, it should just bootstrap moodle as normal, and then echo out something of a fixed set size. It should also throw in some small sleep commands. Then the monitoring tool which calls it should validate that:

  1. the size of data coming over the wire is smaller than the fixed know size (ie gzip is working)
  2. the TTFB is slow

Both the size of the expected data, and the TTFB min, should be able to be passed in as query params, but both should have good defaults

Call to undefined function memcache_connect()

PHP Fatal error:  Call to undefined function memcache_connect() in .../admin/tool/heartbeat/index.php on line 81

Strange, as it my test server uses a working memcached cache store. Digging into /cache/stores/memcached/lib.php it looks like none of the cache_memcached uses memcache_connect() though.

Make an aggregated curl + buffer + compression endpoint check

This will be a new url which can be pinged to test the whole stack is working.

It will:

  • expose a new url /admin/tool/heartbeat/buffercheck.php
  • it will try to curl back to itself to load the progress.php page
    • if it cannot curl at all it will fail (use the raw php curl not the moodle curl api). Also test this again a self signed cert and make sure this still works, ie ignore ssl cert issues this is not what we are testing here and I don't want false negatives due to it)
    • if the TTFB is longer than a second it will fail
    • if the whole page load is shorter than a few seconds then it will fail
  • it will then curl the compresscheck.php and test the opposite, ie
    • that the TTFB is long
    • test that the bytes over the wire is smaller than the data bytes
  • this is a dirty hack: we want to make sure that all envs are actually using the buffering when they need to. So have a list of critical files (/backup/backup.php, /backup/restore.php) and check that each of them contains NO_OUTPUT_BUFFERING in the first 40 lines somewhere. See this commit moodle/moodle@ef8ceb2fc25 We don't need to check them all, because they will be backported at a batch or not
  • test that X-Accel-Buffering is in lib/setuplib.php:1397

Test https and directory slash redirects

All of this ONLY applies on https

Do some low level curls and assert:

  • /my on http redirects to /my on https, OR /my/ on https
  • /my/ on http redirects to /my/ on https OR to the /login on https, but NOT /login on http
  • /dontexist on http redirects to /dontexist on https
  • /pluginfile/xxxxx/.js should not redirect to end in a slash

If the site is not https (why???) then assert these instead:

  • /my -> /my/
  • /dontexist -> 404
  • /pluginfile/xxxxx/.js should not redirect to end in a slash

Test support for 4.1 -> Minor fixes

Fixed a few errors plus moodle-ci codechecker and phpdoc warnings.

Tested against:
• docker-dev local-integration (Ubuntu 22.04.1 LTS)
• mysql 8.0.31
• postgres (14.0 (Debian 14.0-1.pgdg110+1))
• php 8.0.25

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.