GithubHelp home page GithubHelp logo

catalyst / moodle-tool_lockstats Goto Github PK

View Code? Open in Web Editor NEW
6.0 24.0 13.0 190 KB

Moodle cron / task API lock statistics admin tool

Home Page: https://moodle.org/plugins/tool_lockstats

PHP 99.90% CSS 0.10%
moodle moodle-plugin locks report

moodle-tool_lockstats's Introduction

Travis integration: Build Status

tool_lockstats

A lock statistics admin tool, specifically tailored to report on cron task timings.

This tool exposes which tasks are currently running and where, and also shows a detailed history of how long each task has taken in the past.

Warning

Since Moodle 3.10 there is now an internal record of which tasks are running and their host and pid and very similar metadata to what this plugin records. The reporting in core is not quite the same but for simple visbiity of what tasks are currently running this plugin is mostly redudant. See also:

https://tracker.moodle.org/browse/MDL-67211

As the reports in core are improved more this plugin will be fully redundant, so it is best to consider this plugin deprecated but with no clear timeline on when support will end.

Branches

Moodle verion Branch
Moodle 2.7 to 3.8 master
Moodle 3.9 MOODLE_39_STABLE
Moodle 3.10 - 3.11 MOODLE_310_STABLE
Totara 9 - 12 master
Totara 13 MOODLE_310_STABLE

How it works

It implements a proxy lock factory which adds instrumentation around the real lock factory. It will log details about each cron task when a lock is obtained and released. This is the data that is obtained:

  • Task name
  • Duration
  • Hostname
  • Time gained
  • Time released
  • PID

Most of the time, most cron tasks are quick and finish in seconds. These typically are not the tasks you are interesting in the history off. So this plugin compresses the history quick tasks so you still get overall stats for all tasks, and detailed stats for slower bigger tasks, and without bloating out the database with too much data. Old stats can be removed after a set time period too.

Installation

Install the plugin the same as any standard moodle plugin either via the Moodle plugin directory:

https://moodle.org/plugins/tool_lockstats

https://docs.moodle.org/en/Installing_plugins

OR you can use git to clone it into your source:

git clone [email protected]:catalyst/moodle-tool_lockstats.git admin/tool/lockstats

Configuration

This is an example of using the Postgres lock factory, add this to your config.php:

$CFG->lock_factory = "\\tool_lockstats\\proxy_lock_factory";
$CFG->proxied_lock_factory = "auto";

# If you want to be explicit you can do this:
$CFG->proxied_lock_factory = "\\core\\lock\\postgres_lock_factory";

// To allow unit tests to pass.
$CFG->phpunit_lock_factory = "\\tool_lockstats\\proxy_lock_factory";
$CFG->phpunit_proxied_lock_factory = "\\core\\lock\\postgres_lock_factory";

Using the UI you can configure additional settings at,

Site administration > Plugins > Admin tools > Lock statistics

The values you can configure are,

  • Blacklist (Default: core_cron)

This allows you to prevent logging the history for specific tasks.

  • History threshold (Default: 60)

If the task exceeds this value in seconds then a new history entry will be logged.

  • Cleanup history (Default: 30)

A task exists that will clean up history entries that exceed this value in days.

  • Debug (Default: No)

Provides additional debugging messages in the cron.log for when the locks are obtained and released.

Usage

You can view the current locked tasks, lock history and details via the UI at,

Site administration > Server > Lock statistics

The list of current locks is also exposed via a cli script:

$ php admin/tool/lockstats/cli/list_locks.php 
    PID HOST       TYPE    TIME     KEY                  NAME                                    
  10806 zebrafish  adhoc   00:00:06 adhoc_65943          \tool_testtasks\task\timed_adhoc_task   
  10810 zebrafish  adhoc   00:00:05 adhoc_65945          \tool_testtasks\task\timed_adhoc_task   
  10808 zebrafish  adhoc   00:00:05 adhoc_65944          \tool_testtasks\task\timed_adhoc_task   

Found 3 lock(s)

And you can watch this for a dynamic list of processes:

watch -n 1 php admin/tool/lockstats/cli/list_locks.php

moodle-tool_lockstats's People

Contributors

azrek avatar brendanheywood avatar dkleto avatar dmitriim avatar kristian-94 avatar mattwhelan-catalyst avatar mudrd8mz avatar nhoobin avatar pauldamiani avatar peterburnett avatar scottverbeek avatar tessa-fabry avatar tsmilan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

moodle-tool_lockstats's Issues

Too many redirects when change a page for Slowest tasks this week

Using the latest version of the plugin in Moodle 3.5
Webserver Nginx behind a load balancer.
When I click to the next page for Slowest tasks this week report I got redirected to the /admin/tool/lockstats?page=1/////////////////// and it dies with ERR_TOO_MANY_REDIRECTS

Have timeline view which shows the last couple hours

Should show effectively the save kind of time line as in #11

but it should show all the tasks that have been running. The main purpose of this is to easily see if there are latency gaps due to too many heavy tasks which would benefit from additional cron processes.

Make the various columns larger

ERROR: value too long for type character varying(255)
STATEMENT: INSERT INTO mdl_tool_lockstats_locks (resourcekey,gained,host,pid,classname,customdata,latency) VALUES($1,$2,$3,$4,$5,$6,$7) RETURNING id

This is on an odler version but the issue remains. These fields have been renamed but are still 255.

Please change all 5 columns from 255 to 1024

https://github.com/catalyst/moodle-tool_lockstats/blob/master/db/install.xml#L10-L12

https://github.com/catalyst/moodle-tool_lockstats/blob/master/db/install.xml#L31-L32

More details about Ad-hoc tasks

At the moment some ad-hoc tasks could be found in Slowest tasks report. But there is no easy way to found out any details about the task.

Would be good to see at least a task class.

image

Tasks:

  • in current running tasks show the class and lang string for each ad hoc task

Warning when function result as a reference

Notice: Only variables should be passed by reference in /var/www/site/admin/tool/lockstats/classes/table/adhoc_tasks.php on line 128

This throws warning $link = ucwords(str_replace("_", " ", end(explode("\", $class->classname))));

History compression option

Ideally we want to keep history for ages so we can track long term trends and performance stuff. We currently have an option to delete history after a set time, but a better option would be to compress history after some time.

For instance if the threshold is 300 seconds, and we have a task which is constantly taking 400 seconds and runs every minute then history will be jam packed with those records. But long term all we really want is to see how the average run time trends at peak semesters and also changes with optimizations.

So what I want is a second threshold which might default to a month, and anything older than that gets all records for a single day compressed into a single records with an updated duration and count.

Improve the list of front end matrix display

Two minor tweaks:

  • when displaying the name trim it down to the useful bits. ie compare moodle-fe02.cqu.edu.au to the wwwroot moodle.cqu.edu.au, then find the common suffix and remove it so that we are left with just moodle-fe02. This is only display logic and doesn't change the stored data
  • store the list of front ends as a config string. Make it editable via the config page, but also edit it on the fly every time you view the report page to make sure that all front ends are in the list. Make sure they are sorted alpha.
  • In an autoscale environment possibly the list of cron boxes will churn a lot and have random names so the matrix could end up bloated with many columns that are defunct. Not sure best way to handle and happy to leave it until we need to fix it, with both of the items above done we will at least have a workaround to reset the list via the config page.

No need to use $CFG->admin in moodle_url

I noticed you have

$url = new moodle_url("/$CFG->admin/tool/lockstats");

Please note there is no need to use $CFG->admin here as the moodle_url class has always had inbuilt support for it.

Potential bug: Comparisons of text column conditions are not allowed when deleting a lock record

https://github.com/catalyst/moodle-tool_lockstats/blob/master/db/upgrade.php#L186 We have changed this DB field 'customdata' to text.
However, during a log_unlock function, this can cause an error when trying to delete records since text isn't allowed in the where clause:


15:31:38 [error] 212#212: *401 FastCGI sent in stderr: "PHP message: Default exception handler: Comparisons of text column conditions are not allowed. Please use sql_compare_text() in your query. 

Debug: #012Error code: textconditionsnotallowed#012* 
line 691 of /lib/dml/moodle_database.php: dml_exception thrown#012* 
line 1937 of /lib/dml/moodle_database.php: call to moodle_database->where_clause()#012* 
line 336 of /admin/tool/lockstats/classes/proxy_lock_factory.php: call to moodle_database->delete_records()#012* 
line 191 of /admin/tool/lockstats/classes/proxy_lock_factory.php: call to tool_lockstats\proxy_lock_factory->log_unlock()#012* 
line 102 of /lib/classes/lock/lock.php: call to tool_lockstats\proxy_lock_factory->release_lock()#012* 
line 189 of /theme/styles.php: call to core\lock\lock->release()" 

This area of the plugin needs to be rewritten to handle deletion when the field is a text type, or we need to change the DB field to a larger varchar type instead.

https://github.com/catalyst/moodle-tool_lockstats/blob/master/classes/proxy_lock_factory.php#L336

Warning for duplicate records of same class

Moodle function get_records_sql throws warning when duplicated records are returned

HP message: PHP Notice: Did you remember to make the first column something unique in your call to get_records? Duplicate value '\tool_testtasks\task\timed_adhoc_task' found in column 'classname'.

  • line 885 of /lib/dml/pgsql_native_moodle_database.php: call to debugging()
  • line 136 of /admin/tool/lockstats/classes/table/adhoc_tasks.php: call to pgsql_native_moodle_database->get_records_sql()
  • line 41 of /admin/tool/lockstats/index.php: call to tool_lockstats\table\adhoc_tasks->__construct()
in /var/www/moodle/lib/weblib.php on line 3234

Auto detect fast running locks

The vast majority of locks are gained and then released fairly quickly. These are not particularly interesting but we still want to know about them. But we also don't want a massive db table full over noise.

So I'm thinking about some sort of detection where if when the lock is released and the duration of the lock has been less than 1 minute, instead of creating a new lock records we just collect some stats and update an existing record.

So lets say the forum task runs every minute and most of the time does nothing, lets say from midnight til now at 9am. And then some forum stuff happens at 9:01 and it grinds for 20 minutes.

So for every cron tick overnight it will update the last updated lock stat record with the most recent timestamp, and also update the 'totalduration' and a new field 'lockcount'. If the lock has been granted and then released 60 * 9 times, and every time it had an average duration of 2 seconds before release, then the totalduration will be 1080, the lockcount will be 540. Then the meaty lock happens and we will end up with a new record which has a duration of 1200 (60 * 20) and a lockcount of 1.

Now when we come to show a report, we can still give meaningful stats about every task that has run (ie average run time), and easily highlight the particular runs which were lumpy.

historyreset.php requires tool_crawler

Clicking the "Reset the lock stastics history" dies with error because of the line

require_once($CFG->dirroot .'/admin/tool/crawler/lib.php');

I assume it is a forgotten relic from another plugin as I can't see it being used.

Make sql easier to debug with some literal params

UPDATE mdl_tool_lockstats_locks SET task = $1,gained = $2,released = $3,host = $4,pid = $5 WHERE id=$6

Very occasionally we want to see what exactly is locked, I am assuming the lock api call is inside a transaction, so the update is locked until the transaction finishes. This is where we should probably use a second connection (#35). But just getting visibility would help so I'm thinking just the 'task' param gets removed and turned into a string literal in the sql statement. We may need to do this is 2-3 places

Class 'table_sql' not found

When visiting admin/tool/lockstats/ I get

Fatal error: Class 'table_sql' not found in .../admin/tool/lockstats/classes/table/history.php on line 45

That class is not covered by the auto-loading mechanism. I can confirm that adding

require_once($CFG->libdir.'/tablelib.php');

to the history.php file fixes the issue. I assume that details.php may need the same fix.

Missing index.

SELECT * FROM mdl_tool_lockstats_locks WHERE task = $1 Is showing up in our query logs as running without an index..

create index tmpindex on mdl_tool_lockstats_locks(task); is enough, but this should be added to the schema and upgrade scripts.

Possible refresh bug

image

Pretty sure this is a red herring refresh bug. Need to think about the cleanest way to get this cleaned up after a refresh. It would be nice if each plugin could be more involved in the data wash process and have a 'reset' or something

Running out of memory - proxy_lock_factory.php on line 171

Hey Chaps! :)

We are running out of memory on a site with around 250,000 users.

Is there a way to resolve this without increasing the memory?

Execute scheduled task: Check Completion (mod_tincanlaunch\task\check_completion)
... started 10:58:59. Current memory use 26.1MB.
Checking module id 1.

Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 20480 bytes) in 
.../lib/dml/pgsql_native_moodle_database.php on line 797

Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 20480 bytes) in 
.../admin/tool/lockstats/classes/proxy_lock_factory.php on line 171

Line 171 is part of commit 09c49f6

public function release_lock(lock $proxylock) {
    $task = $proxylock->get_key();

    $this->openlocks[$proxylock->get_key()];

    $lock = array_pop($this->openlocks[$proxylock->get_key()]);

Cheers, Russ

Warn in reports if the proxy lock is not configured correctly

As a part of testing, I added

diff --git a/admin/tool/messageinbound/classes/task/pickup_task.php b/admin/tool/messageinbound/classes/task/pickup_task.php
index 1342b6ff9a..e7abf6eba0 100644
--- a/admin/tool/messageinbound/classes/task/pickup_task.php
+++ b/admin/tool/messageinbound/classes/task/pickup_task.php
@@ -48,6 +48,7 @@ class pickup_task extends \core\task\scheduled_task {
      * Execute the main Inbound Message pickup task.
      */
     public function execute() {
+        sleep(350);
         $manager = new \tool_messageinbound\manager();
         return $manager->pickup_messages();
     }

When I executed cron.php, I can see it is sleeping there as expected

Execute scheduled task: Incoming email pickup (tool_messageinbound\task\pickup_task)
... started 12:32:33. Current memory use 38.5MB.

yet there is nothing displayed on the page:

image

Optionally use a second db connection when persisting inside an existing transaction

When updating the lock stats table, inside a transaction which is then rolled back we end up with the stats being rolled back too. In almost all cases we are not inside a transaction, but detect when we are an open a second db connection.

Tasks in order:

  • make a central db helper with a pointer to $DB
  • rewrite all the sql to use this (except the upgrade script)
  • when we any db call involving the _locks table (not history) a new help method like 'check_transaction. If we are then clone the existing $DB config and open a new connection update out pointer
  • investigate and test any interaction with the read slave plugin

Table of longest running tasks

The recent tasks table is great, but it's also repetitive which can mask other long running tasks. All the duplicated tasks all link to the same page with the complete history so its a little redundant. What we really want is to know what tasks take the longest to run in order. Also showing 134 pages isn't particularly useful. Also it obscures really long running tasks because they don't run very frequently, so a task which is just over the threshold runs often, but slow, and swamps the table.

I'm in two minds about whether this should be a new table, or whether one table can serve both the longest running and the most recent running tasks. Leaning towards merging into one table.

eg showing all tasks by the average run time:

select task,count(duration) count, floor(sum(duration) / count(duration)) avg from mdl_tool_lockstats_history group by task  order by avg desc;

I'd be more inclined to filter this to just the history of the last week so that as improvements are made you get a faster change to the new average runtime. We could still filter it to any tasks whose average runtime is > the configured threshold (eg 300 seconds)

Show new summary table of all ad hoc tasks

Put this table above the scheduled tasks table

This will all be aggregate states and grouped by each ad hoc class type

Columns:

  • Name / class
  • Component
  • Queued up
  • Processed
  • Latency

Display performance of lockstats table

This is call 100+ times inside a loop:

$history = $this->task_has_history($key);

Each one queries the history table, should do this once and group by task and statically cache it.

Clean up potentially frozen locks from the gui

Sometimes a lock will get frozen, typically when a process has been manually killed, so the db lock has been released but the lockstats code never ran to close the metadata for it.

Add a manual gui button on the detail page, which enables you to clean up a lock using a specific resource key. The way this will work is that given a resource key it will try to first grab that lock with no timeout. If it can, then we instantly release the lock and the existing code should finish up all the metadata. If we cannot grab the lock then that means that the lock is still actually in use somewhere so we shouldn't touch it.

Optimise get_recent_history SQL

On big instances we have noticed that SQL from get_recent_history method takes a long time to run + puts some load on DB CPU.

We need to optimise this SQL

Record and display ad hoc task latency

We want a nice clean metric which is the delay between when an ad hoc task was supposed to run, vs the time it actually started / completed.

  • make a new 'latency' column in the _locks table
  • when an ad hoc task lock is gained populate the latency field with the duration between when it was queued vs when the lock was gained

I think gained latency is the more useful number here rather than released latency. We are less interested specifically in the relative performance of each type of task and more interested in the performance of the overall system.

  • in the lock detail page add a new column showing the latency

history:

  • in the lock history, add a new columns 'latency'

  • when compressing history update that column. Like duration, this will be the total latency and the average will be determined by dividing by the lockcount on the fly

  • Show a single average ad hoc task latency average for the last day, and last week on the main index page

Enhancement: Email notification

It would be great if this plugin could be configured to send us an email when a task's run time goes over a definable threshold.

  • gerald@rru

Matrix of cron boxes vs pid vs tasks

Probably the best way of visualising what is running right now, is a table with a row for every task, like the normal scheduled task table, and then a column for every front end. The list of front ends can be autodetected as tasks get run the first time, or as front ends scale up.

In the matrix we'd then highlight each cell which show what fe a task is running on and it would show the pid and how long it has been running for.

We'd probably want a task which runs every day or so and detects which fe records are no longer being used and reap them.

Task Fe1 Fe2 Fe3
foo 1 hour
forum 2 min
blah
quiz 1 min
blah 10 secs

Remove dependency on posix_getpid() and detect if installed

I have troubles getting this set up on a site. But it works well on a test / staging clone, it only fails on the production. So I assume it will be something local. Still maybe you can provide some hint / suggestion on what might go wrong.

Once I put following lines to config.php:

$CFG->lock_factory = '\tool_lockstats\proxy_lock_factory';
$CFG->proxied_lock_factory = '\core\lock\file_lock_factory';

then executing cron.php fails with an error:

[15343 | 14/07/2017 02:58:09] Server Time: Fri, 14 Jul 2017 02:58:09 +0800
[15343 | 14/07/2017 02:58:09]
[15343 | 14/07/2017 02:58:09]
[15343 | 14/07/2017 02:58:09] tool_lockstats [lock released]: core_cron
[15343 | 14/07/2017 02:58:09] Default exception handler: Coding error detected, it must be fixed by a programmer: A lock was created but not released at:
[15343 | 14/07/2017 02:58:09] [dirroot]/lib/classes/task/manager.php on line 468
[15343 | 14/07/2017 02:58:09]
[15343 | 14/07/2017 02:58:09] Code should look like:
[15343 | 14/07/2017 02:58:09]
[15343 | 14/07/2017 02:58:09] $factory = corelocklock_config::get_lock_factory('type');
[15343 | 14/07/2017 02:58:09] $lock = $factory->get_lock(core_cron);
[15343 | 14/07/2017 02:58:09] $lock->release();  // Locks must ALWAYS be released like this.
[15343 | 14/07/2017 02:58:09]
[15343 | 14/07/2017 02:58:09] Debug:
[15343 | 14/07/2017 02:58:09] Error code: codingerror
[15343 | 14/07/2017 02:58:09] * line 117 of /lib/classes/lock/lock.php: coding_exception thrown
[15343 | 14/07/2017 02:58:09] * line 148 of /admin/tool/lockstats/classes/proxy_lock_factory.php: call to corelocklock->__destruct()
[15343 | 14/07/2017 02:58:09] * line 468 of /lib/classes/task/manager.php: call to tool_lockstatsproxy_lock_factory->get_lock()
[15343 | 14/07/2017 02:58:09] * line 66 of /lib/cronlib.php: call to coretaskmanager::get_next_scheduled_task()
[15343 | 14/07/2017 02:58:09] * line 61 of /admin/cli/cron.php: call to cron_run()
[15343 | 14/07/2017 02:58:09]
[15343 | 14/07/2017 02:58:09] !!! Coding error detected, it must be fixed by a programmer: A lock was created but not released at:
[15343 | 14/07/2017 02:58:09] [dirroot]/lib/classes/task/manager.php on line 468
[15343 | 14/07/2017 02:58:09]
[15343 | 14/07/2017 02:58:09] Code should look like:
[15343 | 14/07/2017 02:58:09]
[15343 | 14/07/2017 02:58:09] $factory = corelocklock_config::get_lock_factory('type');
[15343 | 14/07/2017 02:58:09] $lock = $factory->get_lock(core_cron);
[15343 | 14/07/2017 02:58:09] $lock->release();  // Locks must ALWAYS be released like this.
[15343 | 14/07/2017 02:58:09]
[15343 | 14/07/2017 02:58:09] !!!

Same error happens when proxying the DB row locking factory (this is a MySQL site).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.