bucardo / check_postgres Goto Github PK
View Code? Open in Web Editor NEWNagios check_postgres plugin for checking status of PostgreSQL databases
Home Page: http://bucardo.org/wiki/Check_postgres
License: Other
Nagios check_postgres plugin for checking status of PostgreSQL databases
Home Page: http://bucardo.org/wiki/Check_postgres
License: Other
Bucardo - a table-based replication system DESCRIPTION: ------------ This is version 5.6.0 of Bucardo. COPYRIGHT: ---------- Copyright (c) 2005-2023 Greg Sabino Mullane <[email protected]> REQUIREMENTS: ------------- build, test, and install Perl 5 (at least 5.8.3) build, test, and install PostgreSQL (at least 8.2) build, test, and install the DBI module (at least 1.51) build, test, and install the DBD::Pg module (at least 2.0.0) build, test, and install the DBIx::Safe module (at least 1.2.4) You must have at least one database that has PL/pgSQL and PL/Perl installed. Target databases may need PL/pgSQL. INSTALLATION: ------------- To install this module type the following: perl Makefile.PL make make test (but see below first) make install EXAMPLES: --------- See the test suite in the t/ subdirectory for some examples. WEBSITE: -------- Please visit https://bucardo.org for complete documentation. DEVELOPMENT: ------------ To follow or participate in the development of Bucardo, use: git clone [email protected]:bucardo/bucardo.git GETTING HELP: ------------- For general questions and troubleshooting, please use the [email protected] mailing list. GitHub issues which are support-oriented will be closed and referred to the mailing list anyway, so help save time for everyone by posting there directly. Post, subscribe, and see previous archives here: https://bucardo.org/mailman/listinfo/bucardo-general
ID: 49
Version: unspecified
Date: 2010-09-06 10:25 EDT
Author: Martin von Oertzen ([email protected])
check_postgres_txn_idle 2.15.0 (from today) results in
Status unknown, if there are no idle-transactions at all.
The --same_schema action picks up whitespace differences. Can't paste here and see the output, so adding image:
Note the newline between "u.first_name" and "u.last_name" in the second database.
It might be useful to note that the first database is v. 8.4.17 and the second database is v. 9.2.7
ID: 91
Version: unspecified
Date: 2011-11-23 09:07 EST
Author: Peter Eisentraut ([email protected])
Database test1:
create schema foo;
create table foo.bar1(a int);
create table foo.bar2(a int);
Database test2:
create schema foo;
create table foo.bar1(a int);
Now I would like to exclude schema "foo" from comparison:
$ check_postgres_same_schema -db test1,test2 --filter='noschema=foo'
POSTGRES_SAME_SCHEMA CRITICAL: (databases:test1,test2) Databases were
different. Items not matched: 1 | time=1.85s
DB 1: port=5432 host=<none> dbname=test1 user=postgres
DB 1: PG version: 8.4.9
DB 1: Total objects: 31
DB 2: port=5432 host=<none> dbname=test2 user=postgres
DB 2: PG version: 8.4.9
DB 2: Total objects: 29
Table "foo.bar2" does not exist on all databases:
Exists on: 1
Missing on: 2
Or just the table:
$ check_postgres_same_schema -db test1,test2 --filter='notable=bar'
POSTGRES_SAME_SCHEMA CRITICAL: (databases:test1,test2) Databases were
different. Items not matched: 1 | time=1.88s
DB 1: port=5432 host=<none> dbname=test1 user=postgres
DB 1: PG version: 8.4.9
DB 1: Total objects: 31
DB 2: port=5432 host=<none> dbname=test2 user=postgres
DB 2: PG version: 8.4.9
DB 2: Total objects: 29
Table "foo.bar2" does not exist on all databases:
Exists on: 1
Missing on: 2
This "radical" solution works:
$ check_postgres_same_schema -db test1,test2 --filter='notables'
POSTGRES_SAME_SCHEMA OK: (databases:test1,test2) All databases have identical
items | time=1.70s
But this doesn't:
$ check_postgres_same_schema -db test1,test2 --filter='noschemas'
POSTGRES_SAME_SCHEMA CRITICAL: (databases:test1,test2) Databases were
different. Items not matched: 1 | time=1.75s
DB 1: port=5432 host=<none> dbname=test1 user=postgres
DB 1: PG version: 8.4.9
DB 1: Total objects: 27
DB 2: port=5432 host=<none> dbname=test2 user=postgres
DB 2: PG version: 8.4.9
DB 2: Total objects: 25
Table "foo.bar2" does not exist on all databases:
Exists on: 1
Missing on: 2
It's somewhat unclear what the "schema" filtering option does anyway. In older
releases I was able to compare just the public schema by using
--exclude='^(?!public)'
. I was hoping that noschema=regex
would provide
that, but then 'noschemas' by itself would make little sense. This should also
be clarified.
ID: 74
Version: unspecified
Date: 2011-04-19 08:23 EDT
Author: Peter Eisentraut ([email protected])
The --exclude option when used with the same_schema check does not apply when
comparing schemas, and a few other things such as user and languages, because
$opt{exclude} isn't looked at there, even though the documentation makes no
such distinction. You can exclude these things separately, using noschema=foo
etc., but it would be convenient, say, to exclude an entire schema and contents
using --exclude='^foo'.
ID: 82
Version: unspecified
Date: 2011-08-01 12:25 EDT
Author: [email protected]
Postgresql Version 8.3.14
Neither check_postgres_table_size or check_postgres_relation_size take into
account TOAST'd data.
SELECT relname, reltoastrelid, relpages FROM pg_class WHERE
relname='my_bigtable';
relname | reltoastrelid | relpages
------------------+---------------+----------
my_bigtable | 51687163 | 8351228
(1 row)
SELECT relname, reltoastrelid, relpages FROM pg_class WHERE oid = 51687163;
relname | reltoastrelid | relpages
-------------------+---------------+----------
pg_toast_51687160 | 0 | 33528180
There are a lot more pages in TOAST'd space.
SELECT pg_size_pretty(pg_total_relation_size('my_bigtable'));
pg_size_pretty
----------------
333 GB
SELECT pg_size_pretty(pg_relation_size('my_bigtable'));
pg_size_pretty
----------------
64 GB
(1 row)
==========================
check_postgres_relation_size --warning='4 GB' --critical='4.5 GB'
--include=my_bigtable --dbname=my_db
Password for user postgres:
POSTGRES_RELATION_SIZE CRITICAL: DB "my_db" largest relation is table
"public.my_bigtable": 64 GB | time=8.27 public.my_bigtable=69034262528
When custom_query
returns multiple rows, their performance data string is duplicated in subsequent rows. For example:
./check_postgres.pl --action=custom_query --dbname=postgres --dbuser=peisentraut --warning=100 --query="select 30 as result, 'foo' as data union select 20, 'bar' union select 10, 'baz' order by 1 desc"
POSTGRES_CUSTOM_QUERY OK: DB "postgres" 30 * 20 * 10 | time=0.08s data=foo;100 time=0.08s data=foo;100 data=bar;100 time=0.08s data=foo;100 data=bar;100; data=baz;100
Correct would be
POSTGRES_CUSTOM_QUERY OK: DB "postgres" 30 * 20 * 10 | time=0.08s data=foo;100 time=0.08s data=bar;100 time=0.08s data=baz;100
The fix appears to be to change line 4100
$db->{perf} .= sprintf ' %s=%s;%s;%s',
to
$db->{perf} = sprintf ' %s=%s;%s;%s',
Our production environment uses both EnterpriseDB and PostgreSQL. check_postgres.pl doesn't work pretty well with EnterpriseDB and here is the first cut fix to make things work. My perl is not that great however this is the patch which we have been using in production.
diff --git a/check_postgres.pl b/check_postgres.pl
index 32bd338..f9e55e0 100755
--- a/check_postgres.pl
+++ b/check_postgres.pl
@@ -1024,9 +1024,9 @@ if (! defined $PSQL or ! length $PSQL) {
}
-x $PSQL or ndie msg('opt-psql-noexec', $PSQL);
$res = qx{$PSQL --version};
-$res =~ /^psql (PostgreSQL) (\d+.\d+)(\S_)/ or ndie msg('opt-psql-nover');
-our $psql_version = $1;
-our $psql_revision = $2;
+$res =~ /((?:^edb-)?psql) ((PostgreSQL)|EnterpriseDB) (\d+.\d+)(\S_)/ or ndie msg('opt-psql-nover');
+our $psql_version = $3;
+our $psql_revision = $4;
$psql_revision =~ s/\D//g;
$VERBOSE >= 2 and warn qq{psql=$PSQL version=$psql_version\n};
@@ -1940,10 +1940,10 @@ sub run_command {
if ($db->{error}) {
ndie $db->{error};
}
if ($db->{slurp} !~ /PostgreSQL (\d+.\d+)/) {
if ($db->{slurp} !~ /(PostgreSQL|EnterpriseDB) (\d+.\d+)/) {
ndie msg('die-badversion', $db->{slurp});
}
$db->{version} = $1;
$db->{version} = $2;
$db->{ok} = 0;
delete $arg->{versiononly};
## Remove this from the returned hash
@@ -3040,7 +3040,7 @@ sub check_connection {
$db = $info->{db}[0];
my $ver = ($db->{slurp}[0]{v} =~ /PostgreSQL (\d+.\d+\S+)/o) ? $1 : '';
my $ver = ($db->{slurp}[0]{v} =~ /(PostgreSQL|EnterpriseDB) (\d+.\d+\S+)/o) ? $2 : '';
$MRTG and do_mrtg({one => $ver ? 1 : 0});
ID: 81
Version: unspecified
Date: 2011-07-28 10:39 EDT
Author: Peter Eisentraut ([email protected])
When using the --PSQL
option, the path check is too restrictive, for example:
./check_postgres.pl --PSQL=/usr/lib/postgresql/8.4/bin/psql --action=connection
--db=test
ERROR: Invalid psql argument: must be full path to a file named psql
The code is fairly simple-minded about this:
$PSQL =~ m{^/[\w\d\/]*psql$}
I would just simplify this to something like
$PSQL =~ m{^/.*/psql$}
(or remove it altogether). Consider typical paths on Windows (bug 36).
If I have different constraints for two tables, but they have the same name, the --same_schema action will mix them up between the two databases (especially if one database is v. 8.4.x and the other database is v. 9.2.x):
EXAMPLE 1:
Constraint "public.min_password_length_check":
"conkey" is different:
Database 1: {2}
Database 2: {6}
"consrc" is different:
Database 1: (length((join_password)::text) >= 4)
Database 2: (length((enrollment_password)::text) >= 4)
"tname" is different:
Database 1: table_1
Database 2: table_2
-- table_1 definition:
Table "public.table_1"
Column | Type | Modifiers
---------------------+-----------------------+------------------------
id | integer | not null
join_password | character varying(12) | not null
lock_prefs | boolean | not null default false
lock_dates | boolean | not null default false
lock_info | boolean | not null default false
allow_sec_assign | boolean | not null default true
lock_s_view_reports | boolean | not null default false
Indexes:
"table_1_pkey" PRIMARY KEY, btree (id)
Check constraints:
"min_password_length_check" CHECK (length(join_password::text) >= 4)
Foreign-key constraints:
-- table_2 definition:
Table "public.table_2"
Column | Type | Modifiers
-------------------------+--------------------------+--------------------------------------------------------------
id | integer | not null default nextval(('table_2_id_seq'::text)::regclass)
class_type | smallint | not null
title | character varying(100) | not null
class_number | character varying(50) |
description | character varying(1000) |
enrollment_password | character varying(12) | not null
state_flag | smallint | not null default 10
date_lastmodified | timestamp with time zone | not null default now()
date_setup | timestamp with time zone | not null default ('now'::text)::date
date_start | timestamp with time zone | not null
date_end | timestamp with time zone | not null
term_length | interval | not null default '5 years'::interval
remoteaddr | inet | not null
class_homepage_name | character varying(50) |
class_homepage_url | character varying(200) |
max_file_size | integer | not null default 20971520
max_paper_length | integer | not null default 1000000
grading_scale_slot | smallint | not null default 0
scale_owner | integer |
products_enabled | integer | not null default 1535
s_view_reports | boolean | not null default false
s_submit_topics | boolean | not null default true
account | integer | not null
user | integer |
drop_lowest_grade | boolean | not null default false
source | smallint | not null default 0
s_view_user_email | boolean | not null default true
max_portfolio_file_size | integer |
native_locked | boolean | not null default false
Indexes:
"table_2_pkey" PRIMARY KEY, btree (id)
"table_2_account_idx" btree (account)
"table_2_user_idx" btree (user)
Check constraints:
"min_password_length_check" CHECK (length(enrollment_password::text) >= 4)
Foreign-key constraints:
EXAMPLE 2:
Constraint "public.$1":
"confdeltype" is different:
Database 1: a
Database 2:
"conffeqop" is different:
Database 1: {96}
Database 2:
"confkey" is different:
Database 1: {1}
Database 2:
"confmatchtype" is different:
Database 1: u
Database 2:
"confupdtype" is different:
Database 1: a
Database 2:
"conkey" is different:
Database 1: {2}
Database 2: {2,3}
"conpfeqop" is different:
Database 1: {96}
Database 2:
"conppeqop" is different:
Database 1: {96}
Database 2:
"consrc" is different:
Database 1:
Database 2: (start_date <= end_date)
"contype" is different:
Database 1: f
Database 2: c
"tname" is different:
Database 1: table_3
Database 2: table_4
--table_3 definition:
Table "public.table_3"
Column | Type | Modifiers
--------------------+--------------------------+--------------------------------------------------------
id | integer | not null default nextval('table_3_id_seq'::regclass)
source | integer | not null
reader | integer | not null
grading_group | integer |
grade | smallint |
score | smallint |
read_comment | text |
read_type | integer | not null
date_submitted | timestamp with time zone |
duration | interval | default '00:00:00'::interval
delete_flag | boolean | not null default false
outlying | boolean | not null default false
needs_arbiter | boolean | not null default false
summary | text |
last_saved | timestamp with time zone | default now()
date_created | timestamp with time zone | default now()
pm_review_set | integer | not null default (-1)
last_gm_version | character varying(10) | not null default 'abc2'::character varying
user_view_first | timestamp with time zone |
user_view_last | timestamp with time zone |
user_view_count | integer |
updated_via_ios | boolean | default false
Indexes:
Foreign-key constraints:
"$1" FOREIGN KEY (source) REFERENCES table_x(id)
"$2" FOREIGN KEY (reader) REFERENCES table_y(id)
"$4" FOREIGN KEY (read_type) REFERENCES table_3_type(id)
"table_3_other_table_fkey" FOREIGN KEY (other_table) REFERENCES table_z(id)
--table_4 definition:
Table "public.table_4"
Column | Type | Modifiers
---------------+-----------------------------+-------------------------------------------------------------
id | integer | not null default nextval('table_4_id_seq'::regclass)
start_date | timestamp without time zone | not null default ('now'::text)::date
end_date | timestamp without time zone | not null
priority | smallint | not null
account_types | smallint | not null
platform | smallint | not null
content | text | not null
max_views | smallint | not null default 1
type | integer | not null default 1
header | text |
link_url | text |
Indexes:
Check constraints:
"$1" CHECK (start_date <= end_date)
Foreign-key constraints:
"$2" FOREIGN KEY (priority) REFERENCES table_4_priority(id)
"$3" FOREIGN KEY (account_types) REFERENCES table_4_group(id)
"$4" FOREIGN KEY (platform) REFERENCES table_4_platform(id)
"type_fkey" FOREIGN KEY (type) REFERENCES table_4_type(id) ON DELETE CASCADE
Referenced by:
ID: 75
Version: unspecified
Date: 2011-04-19 08:27 EDT
Author: Peter Eisentraut ([email protected])
Although the documentation is technically correct on this, it would be really
helpful to make it crystal clear that in the same_schema action the --exclude
options works on a completely different logic than in most other actions that
follow the scheme explained in the section "BASIC FILTERING".
I would change this paragraph
"You may exclude all objects of a certain name by using the "exclude" option.
It takes a Perl regular expression as its argument."
to
"You may exclude all objects of a certain name by using the "exclude" option.
It takes a Perl regular expression as its argument. The option can be repeated
to specify multiple patterns to exclude. (Note that the --exclude option for
this action does not follow the logic explained in the "BASIC FILTERING"
section.)"
The alternative would be to eliminate this distinction, but that might break
too many things for users.
ID: 96
Version: Unspecified
Date: 2012-01-03 06:44 EST
Author: Peter Eisentraut ([email protected])
The documentation for the pgbouncer_backends check says:
"Note that the user you are connecting as must be a superuser for this to work
properly."
PgBouncer doesn't really have the concept of a superuser. It has admin_users
and stats_users. I think the permission required for this check is
stats_users. Please correct that in the docs.
We're running check_postgres.pl
like this:
check_postgres.pl -u nagios --action=last_vacuum --exclude=~^pg --db=csi
The output is:
No matching tables found due to exclusion/inclusion options
But that can't be right. The query that ends up running is (reformatted):
SELECT current_database(), nspname, relname,
CASE WHEN v IS NULL THEN -1 ELSE round(extract(epoch FROM now()-v)) END,
CASE WHEN v IS NULL THEN '?' ELSE TO_CHAR(v, 'HH24:MI FMMonth DD, YYYY') END
FROM (
SELECT nspname, relname, GREATEST(
pg_stat_get_last_vacuum_time(c.oid),
pg_stat_get_last_autovacuum_time(c.oid)
) AS v
FROM pg_class c, pg_namespace n
WHERE relkind = 'r'
AND n.oid = c.relnamespace
AND n.nspname <> 'information_schema'
ORDER BY 3
) AS foo;
When I run that manually, I get rows like:
csi | pg_catalog | pg_authid | 390524 | 23:16 March 20, 2010
csi | public | foo | -1 | ?
csi | pg_catalog | pg_auth_members | -1 | ?
So we do have a table, "foo", that is not excluded. However, it's never been vacuumed, so round
is set to -1. The error should not be that no matching tables were found because of the exclusion, because a table is found and not excluded, but has never been vacuumed. So it should probably say something like "no unvacuumed tables found" instead.
I think that the reason it works that way is this bit of code in check_last_vacuum_analyze()
:
SLURP: while ($db->{slurp} =~ /(\S+)\s+\| (\S+)\s+\| (\S+)\s+\|\s+(\-?\d+) \| (.+)\s*$/gm) {
my ($dbname,$schema,$name,$time,$ptime) = ($1,$2,$3,$4,$5);
$maxtime = -3 if $maxtime == -1;
if (skip_item($name, $schema)) {
$maxtime = -2 if $maxtime < 1;
next SLURP;
}
So looking at the three rows returned above, it looks like:
$maxtime
set to -2$maxtime
is -1 and so gets set to -3$maxtime
set to -2Since the last row fetched set $maxtime
to -2, this code then gets triggered:
if ($maxtime == -2) {
add_unknown msg('no-match-table');
}
But that's wrong. I think what needs to happen is that it needs to know that unexcluded rows were returned (the second row in this example) but were never vacuumed. Not sure how you'd go about that using $maxtime
as a flag; maybe you need some other flag? Maybe something like this?
--- a/check_postgres.pl
+++ b/check_postgres.pl
@@ -3469,6 +3469,7 @@ sub check_last_vacuum_analyze {
my ($minrel,$maxrel) = ('?','?'); ## no critic
my $mintime = 0; ## used for MRTG only
my $count = 0;
+ my $unskipped;
SLURP: while ($db->{slurp} =~ /(\S+)\s+\| (\S+)\s+\| (\S+)\s+\|\s+(\-?\d+) \| (.+)\s*$/gm) {
my ($dbname,$schema,$name,$time,$ptime) = ($1,$2,$3,$4,$5);
$maxtime = -3 if $maxtime == -1;
@@ -3476,6 +3477,7 @@ sub check_last_vacuum_analyze {
$maxtime = -2 if $maxtime < 1;
next SLURP;
}
+ $unskipped ||= 1;
$db->{perf} .= " $dbname.$schema.$name=${time}s;$warning;$critical" if $time >= 0;
if ($time > $maxtime) {
$maxtime = $time;
@@ -3497,7 +3499,7 @@ sub check_last_vacuum_analyze {
}
if ($maxtime == -2) {
- add_unknown msg('no-match-table');
+ add_unknown msg($unskipped ? 'no-vacuumed-table' : 'no-match-table');
}
elsif ($maxtime < 0) {
add_unknown $type eq 'vacuum' ? msg('vac-nomatch-v') : msg('vac-nomatch-a');
Thanks.
David
The full error message for query time:
"Use of uninitialized value $ENV{"HOME"} in concatenation (.) or string at /usr/lib64/nagios/plugins/check_postgres.pl line 868. "
This message occurs on icinga's front-end screen, but returns the proper output on the commandline including running it as user icinga.
Here's my config:
check_postgres.pl -H
Using Perl v5.10.1., CentOS release 6.3 (Final), icinga 1.8.4 and the latest check_postgres.pl (10454 lines (8340 sloc) 382.487 kb).
We're using check_postgres to email alerts out to interested parties on a select number of actions, one of which is query_time to help identify long running queries.
I keep getting asked if there is a way to see what the query is - it's great having the notification but they don't have access to the live servers, which means the DBA's then need to log in and pull the information out.
An optional switch (Disabled by default) to output the SQL the alert is referring to would be useful if it's at all possible?
Thanks
ID: 53
Version: unspecified
Date: 2010-11-03 12:42 EDT
Author: Peter Eisentraut ([email protected])
I run
perl Makefile.PL
make
make install
and the resulting installation is
/usr/local/share/perl/5.10.1/check_postgres.pl
/usr/local/man/man3/check_postgres.3pm
/usr/local/bin/check_postgres.pl
I think this is a bit odd. Why is check_postgres.pl installed in two
locations?
I tried tweaking this a little bit. Adding
PM => {},
to %opts
prevents the installation under /usr/local/share
. Then removing the
line
MAN1PODS => {},
installs the man page, but it's then called check_postgres.pl.1p
, whereas the
documentation says to use man check_postgres
.
I think the sort of installation layout I'd expect is approximately
/usr/local/bin/check_postgres
/usr/local/man/man1/check_postgres.1p
There is some room for variation, but the current behavior is strange.
With version 2.20.1, the docs say to do:
check_postgres.pl --action=replicate_row --host=master --host2=slave1,slave2
But when I tried that I got "ERROR: No slaves found". I did get it to work with:
check_postgres.pl --action=replicate_row --host=master,slave1,slave2
though.
It's certainly conceivable that I'm doing something wrong, but it seems like a bug. Please let me know if you need any other info from me.
Thanks.
check_postgres works well. If I receive an error or warning, I know that something is wrong.
But I can't see what is going on.
It would be great to see the pg_stat_activity output if there is something wrong.
http://www.postgresql.org/docs/9.2/static/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW
Our applications set the "application_name" which makes the output even more useful, but I guess a lot of applications don't do it.
In the disk_space', 'txn_idle' and 'txn_time' checks (at least), check_postgres misidentifies the server version when presented with a version like:
PostgreSQL 9.4beta1_bdr0601
It looks like an issue within sub run_command
within the version-specific statement code.
I initially thought it was testing the client-side psql
version, but while it does separately do that, that isn't the cause of this problem.
bloat
has this issue in older releases, but works with the current release.
check_postgres
tries to determine the server's version in a number of places, in order to craft queries that cope with different catalogs on different server versions.
It does so by fetching and parsing the server_version
in pg_settings
. This is simply wrong - it should be using server_version_num
.
It was added in 8.2, so there's no reason to bother with backward compatibility.
commit 04912899e792094ed00766b99b6c604cadf9edf7 refs/tags/REL8_2_BETA1
Author: Bruce Momjian <[email protected]>
Date: Sat Sep 2 13:12:50 2006 +0000
Add new variable "server_version_num", which is almost the same as
"server_version" but uses the handy PG_VERSION_NUM which allows apps to
do things like if ($version >= 80200) without having to parse apart the
value of server_version themselves.
Greg Sabino Mullane [email protected]
This is related to the prior investigation I did on issue #70 .
hitratio ( and maybe others ) are joining pg_user to pg_database. If the database is owned by a role ( newer versions of PG ), hitratio is not returned:
SELECT
round(100.*sd.blks_hit/(sd.blks_read+sd.blks_hit), 2) AS dhitratio
d.datname,
u.usename
FROM pg_stat_database sd
JOIN pg_database d ON (d.oid=sd.datid)
JOIN pg_user u ON (u.usesysid=d.datdba)
WHERE sd.blks_read+sd.blks_hit<>0;
should be:
SELECT
round(100.*sd.blks_hit/(sd.blks_read+sd.blks_hit), 2) AS dhitratio,
d.datname,
u.rolname as usename
FROM pg_stat_database sd
JOIN pg_database d ON (d.oid=sd.datid)
JOIN pg_roles u ON (u.oid=d.datdba)
WHERE sd.blks_read+sd.blks_hit<>0;
OS: SL 2.6.32-358.18.1.el6.x86_64
plugins]# ./check_postgres.pl --action=connection --db=chimera -H localhost -p 5432 -u postgres
Cannot find Time::HiRes, needed if 'showtime' is true at ./check_postgres.pl line 1267.
This was solved by installing:
yum install perl-Time-HiRes.x86_64
ID: 22
Version: unspecified
Date: 2009-12-03 12:58 EST
Author: Greg Sabino Mullane ([email protected])
Use pg_stattuple to get the bloat information, either as a new action or a flag
to the current one.
Since we switched to regulary do a "vacuum full", the last_vacuum-Check reports errors, because pg_stat_user_tables has old values.
If there is no other way to find out the last_full_vacuum-date, that should be mentioned at the check_postgres-Documentation.
Hi,
I've some errors using custom_query.
With nagios output :
Use of uninitialized value $data in numeric ge (>=) at ./check_postgres_custom_query line 3121.
Use of uninitialized value $data in numeric ge (>=) at ./check_postgres_custom_query line 3130.
Use of uninitialized value $data in string at ./check_postgres_custom_query line 3139.
With mrtg ad simple output :
Use of uninitialized value $data in string at ./check_postgres_custom_query line 3139.
Action custom_query failed: Unknown error
I have been dealing with a problem in a new installation I was doing today. The problem came up as I saw in Icinga that txn_time and txn_idle were flapping and for no reason I could think of.
Checking the code, I found that the problem came here:
## Return unknown if stats_command_string / track_activities is off
if ($cq =~ /disabled/o or $cq =~ /<command string not enabled>/) {
add_unknown msg('psa-disabled');
return;
}
What would happen if the query had in some part the word 'disabled'? For example a column of a table which has been selected.
I think this is not the best way to check if a postgres setting is set accordinly.
Removing the offending code (as I'm sure the postgres setting has track_activities on) made everything work as expected.
I've used an old alternate monitoring plugin called "check_pg_waiting_queries.pl". It's poorly written, so that when it's run against PostgreSQL 9.1, it always reports "success" when in fact the query contains a syntax error, so it should fail.
I presume that check_postgres.pl provides an appropriate upgrade path, or that I don't actually need this monitor at all, but I'm not quite clear. I see that check_postgres.pl provides a check for locks, and I know that locks are closely related to "waiting queries". If the "locks" check is an appropriate replacement for a "waiting queries" check, it would be helpful if a sentence could be added to those docs to clarify it.
ID: 57
Version: 2.14.3
Date: 2010-12-01 14:19 EST
Author: [email protected]
When using negative numbers to check free backends, the checks on line 2541 and
2556 in check_backends procedure should check for less-equal, not greater-equal
for warning and critical status
ID: 21
Version: unspecified
Date: 2009-12-03 12:57 EST
Author: Greg Sabino Mullane ([email protected])
The current bloat calculation is very rough and fails to account for many
things. Make it more accurate.
ID: 84
Version: unspecified
Date: 2011-10-18 08:12 EDT
Author: tom ([email protected])
In wiki page,you said:
It is required that one of the columns be named "result" and is the item that
will be checked against your warning and critical values. The second column is
for the performance data and any name can be used: this will be the 'value'
inside the performance data section.
It's wrong,It is required that second of the columns be named "data"
The second issue is the message include the performance data can be shown in
the linux console,but nagios page can't get the message,it can only read the
message before the pipe symbol.
I try to delete the line print '| '; or change the pipe symbol to other
character,for exmple 'kk' in the method dumpresult() of source code
'check_postgres.pl' ,it can show the complete message.
console:
POSTGRES_CUSTOM_QUERY CRITICAL: DB "demodb" (host:postgresql.demo.dev) 0 |
time=2.12 Check for records added in the last 4 hours to the demo table,the
result is:0
nagios page:(miss the above messages after a pipe symbol)
POSTGRES_CUSTOM_QUERY CRITICAL: DB "demodb" (host:postgresql.demo.dev) 0
[.... check_postgres]# perl Makefile.PL
Configuring check_postgres 2.21.0
Checking if your kit is complete...
Warning: the following files are missing in your kit:
MYMETA.yml
Please inform the author.
Writing Makefile for check_postgres
Please inform the author. Which I have done by opening this issue
Thanks.
ID: 95
Version: unspecified
Date: 2011-12-22 09:54 EST
Author: Peter Eisentraut ([email protected])
check_postgres version 2.18.0
I cannot get hot_standby_delay to work at all. Something like this ought to do
something:
$ check_postgres_hot_standby_delay --dbhost=localhost --dbhost2=localhost
--dbport=5435 --dbport=5435 --dbuser=postgres -w 30 -c 100
Use of uninitialized value $slave in numeric eq (==) at
/usr/bin/check_postgres_hot_standby_delay line 4581.
POSTGRES_HOT_STANDBY_DELAY UNKNOWN: DB "postgres" (host:localhost) (port=5435)
Invalid query returned: receive <PIPE> \n replay <PIPE> \n | time=0.09s
An actual example in the documentation would be nice, in case this is not a
real bug but just a misunderstanding.
i created a unique index concurrently.
after it was build, i droped the old index.
nagios reports
[2011-02-16 16:48:05] SERVICE ALERT:
pg1;PG-bloat;
CRITICAL;HARD;10;CHECK_NRPE:
Socket timeout after 30 seconds.
[2011-02-16 16:52:45] SERVICE ALERT:
pg1;PG-bloat;UNKNOWN;HARD;10;ERROR:
ERROR: relation with OID 344939906 does not exist
maybe it should not be an error, if the oid does not exists.
Hi, I've just noticed a bug.
./check_postgres.pl --dbservice="theService_in_pg_service.conf"--dbuser=theUser --dbpass=heyhey --action=connection
Use of uninitialized value in printf at ./check_postgres.pl line 1748.
Use of uninitialized value in printf at ./check_postgres.pl line 1748.
Password for user theUser:
So it seems that the script doesnt retrieve the password given as a parameter when using dbservice.
ID: 94
Version: unspecified
Date: 2011-12-09 19:56 EST
Author: [email protected]
Great program!
I have installed the RHEL 6 rpm version (Version : 2.18.0) from the
postgresql repository (From repo : pgdg91) and I am having a problem that
unless I execute the program from the pg_hba directory it's unable to find the
correct paths:
/usr/bin/check_postgres.pl -H localhost --dbuser=postgres --action disk_space
ERROR: Invalid result from command "/bin/df -kP "../../pg_logs/instance1"
2>&1": /bin/df: `../../pg_logs/instance1': No such file or directory
/bin/df: no file systems processed
however if I print out "$i{S}{data_directory}" the full data path is valid.
my case "/opt/postgresql/data/instance1."
So not sure what the right way of doing this is but my current workaround was
to apply the following:
###############################################################################
*** /usr/bin/check_postgres.pl 2010-09-12 21:58:34.953908816 -0700
--- /usr/bin/check_postgres.pl.chg 2010-09-12 22:00:24.125192752 -0700
***************
*** 4206,4211 ****
--- 4206,4212 ----
add_unknown msg('diskspace-nodata');
next;
}
+ chdir($i{S}{data_directory});
my ($datadir,$logdir) =
($i{S}{data_directory},$i{S}{log_directory}||'');
if (!exists $dir{$datadir}) {
###########################################################################
Please let me know if this is something that can be updated? Thank you-
I have an error when running the check_postgres_hot_standby_delay function. I issue the following command:
./check_postgres_hot_standby_delay --host=ipprimary --dbuser=user --dbpass=pass --host2=ipsec --dbuser2=user --dbpass2=pass --warning=50 --critical=1024
When I execute this command the response is the following.
Use of uninitialized value $slave in numeric eq (==) at ./check_postgres_hot_standby_delay line 4750.
POSTGRES_HOT_STANDBY_DELAY UNKNOWN: DB "postgres" (host:ipprimary) Invalid query returned: receive \n replay \n | time=0.23s
I hope that someone can make sens of it.
Would it be possible to add a filter at application_name as to not check txn_idle for certain applications (for example long pg_dumps)?
If so, which would be the best aproach? Add a APPNAMEWHERECLAUSE, or just add specific filters in check_txn_time and pass them on to check_txn_idle?
I think the second would do, as I don't fint any where else for this to be needed.
On a machine with multiple versions of postgres installed you get some version mismatches because check_postgres uses the version of psql not the version of the server that it connects to.
We have version 8.4 running, but we also have version 9.3 installed but not turned on.
psql on its own tells the state of affairs.
% psql
psql (9.3.5, server 8.4.22)
Type "help" for help.
...
Asking psql what version it is gets the version of psql not postgres
% psql --version
psql (PostgreSQL) 9.3.5
ID: 115
Version: 2.19.0
Date: 2012-10-30 03:21 EDT
Author: [email protected]
I have multiple databases with the same schema including same sequences. I want
to check their sequences with --action=sequence
.
check_postgres.pl -H 10.1.30.5 --action=sequence -db=staging,production
--critical=95%
ERROR: FEHLER: Relation »public.customer_addresses_id_seq« existiert nicht
LINE 7: FROM public.customer_addresses_id_seq) foo
When I am doing this with single commands like ...
check_postgres.pl -H 10.1.30.5 --action=sequence -db=staging --critical=95%
check_postgres.pl -H 10.1.30.5 --action=sequence -db=production --critical=95%
... everything goes well.
Seems to be a bug.
ID: 98
Version: 2.19.0
Date: 2012-01-19 05:46 EST
Author: Martin von Oertzen ([email protected])
one part of the --action=sequence
needs over 3 minutes with postgres 9.1.2:
SELECT nspname, seq.relname, typname
FROM pg_attrdef
JOIN pg_attribute ON (attrelid, attnum) = (adrelid, adnum)
JOIN pg_type on pg_type.oid = atttypid
JOIN pg_class rel ON rel.oid = attrelid
JOIN pg_class seq ON seq.relname = regexp_replace(adsrc,
$re$^nextval\('(.+?)'::regclass\)$$re$, $$\1$$)
AND seq.relnamespace = rel.relnamespace
JOIN pg_namespace nsp ON nsp.oid = seq.relnamespace
WHERE adsrc ~ 'nextval' AND seq.relkind = 'S' AND typname IN ('int2', 'int4',
'int8')
on an other computer i use postgres 8.3.16:
$ check_postgres_sequence --db=postgres --perflimit=1
POSTGRES_SEQUENCE OK: DB "postgres" public.db_clients_id_seq=0% (calls
left=2147483539) | time=0.05s public.db_clients_id_seq=0%;85%;95%
$ check_postgres_sequence --db=mydb --perflimit=1
Can't use an undefined value as an ARRAY reference at check_postgres_sequence
line 7118.
$ check_postgres_sequence --db=postgres,mydb --perflimit=1
ERROR: ERROR: relation "public.db_clients_id_seq" does not exist
When connecting explicitly by Unix domain socket (rather than over TCP), the output is a bit confusing. To do this, psql and other tools need you to specify the directory that contains the Unix socket (in my case, that's /tmp) in the "-h" option. For example, you might get something like this:
OK: DB "drupal" (host:/tmp) longest txn: 0s
Could this be altered to say something like: "OK: DB "drupal" (host:local) longest txn: 0s"? In terms of perl, it's something like this:
$hostname = 'local' if($hostname =~ /^//);
Thanks!
Hi! If I understand the script correctly, the slony-status checks only 1 slave of a cluster, the one which is returned first from the query. This is random. E.g. here a scenario where one slave is behind:
SELECT
ROUND(EXTRACT(epoch FROM st_lag_time)) AS lagtime,
st_origin,
st_received,
current_database() AS cd,
COALESCE(n1.no_comment, '') AS com1,
COALESCE(n2.no_comment, '') AS com2
FROM _regdnscluster.sl_status
JOIN _regdnscluster.sl_node n1 ON (n1.no_id=st_origin)
JOIN _regdnscluster.sl_node n2 ON (n2.no_id=st_received);
lagtime | st_origin | st_received | cd | com1 | com2
---------+-----------+-------------+--------+-------------+------------------
67 | 1 | 3 | regdns | Master Node | regdev-tst2 node
1792 | 1 | 2 | regdns | Master Node | regdev-tst1 node
(2 rows)
I would expect that the script reports ERROR as one of the nodes is behind, but it reports:
./check_postgres.pl --action=slony_status --schema=_regdnscluster --dbname=regdns --warning=300 --critical=600
POSTGRES_SLONY_STATUS OK: DB "regdns" schema:_regdnscluster Slony lag time: 68 (68 seconds) | time=0.08s 'regdns._regdnscluster Node 1(Master Node) -> Node 3(regdev-tst2 node)'=68;300;600
In my opinion, it should either check all slaves, or at least the slave with the highest lag. Here a proposed fix (ORDER BY lag DESC):
--- check_postgres.pl.orig 2013-12-09 09:49:57.000000000 +0000
+++ check_postgres.pl 2013-12-09 09:50:40.000000000 +0000
@@ -7418,7 +7418,8 @@
COALESCE(n2.no_comment, '') AS com2
FROM SCHEMA.sl_status
JOIN SCHEMA.sl_node n1 ON (n1.no_id=st_origin)
-JOIN SCHEMA.sl_node n2 ON (n2.no_id=st_received)};
+JOIN SCHEMA.sl_node n2 ON (n2.no_id=st_received)
+ORDER BY lagtime DESC};
my $maxlagtime = -1;
regards
Klaus
ID: 54
Version: unspecified
Date: 2010-11-19 05:11 EST
Author: Aleksey Tsalolikhin ([email protected])
Hi. First of all, thanks for a great and most useful tool!
Secondly, we've just discovered that --same-schema check misses indexes.
We have some indexes that don't involve primary key constraints, and --same-schema check fails to report differences between tables if database A has a table with 1 index, and database B has that same table with 2 indexes. The 2nd index does not involve primary key constraints.
Thanks again for a great tool!
Yours truly,
Aleksey
Hi Guys,
A long time ago we had an issue with replicate_row not quoting table names and this was fixed (currently line 6120) with a simple
$table = qq{"$table"}
However I've just had an issue where I want to check two tables with the same name in different schemas and end up getting my table names quoted as "schema.table". I couldn't see any option to pass a schema parameter, and ended up swerving the issue by passing the table name as schema"."tablename - but unless I've missed something wouldn't it be best to replace the above line for something to quote around the full stop too like:
$table =~ s/([^.]+)/"$1"/g;
Hi,
On a server I have a large number of databases.
I would love it if the multiline nagios plugin output for checks like DB size was only showing the ones that actually exceed the threshold.
The way it is now it's unreadable.
the full error message is:
"Use of uninitialized value $SQL2 in concatenation (.) or string at ./check_postgres.pl line 7555."
The simple fix for it is the addition of
$SQL2 = $SQL;
in (new) line 7546.
ID: 107
Version: 2.19.0
Date: 2012-07-10 07:19 EDT
Author: Peter Eisentraut ([email protected])
Example, check that there are not less than 10 clients per database:
check_postgres --action=pgb_pool_cl_active -w 10 --reverse
POSTGRES_PGB_POOL_CL_ACTIVE WARNING: DB "pgbouncer" otherdb=53 | time=0.05s
Come to think of it, I don't know whether this output makes sense. With
2.16.0, it says
POSTGRES_PGB_POOL_CL_ACTIVE WARNING: pgbouncer=1 | time=0.05
which is closer to the problem. Is it really useful to check the special
"pgbouncer" database in these checks? It's easy to exclude them using
--exclude=pgbouncer, of course.
ID: 72
Version: unspecified
Date: 2011-03-29 03:34 EDT
Author: Andy Lester ([email protected])
We had an app that turned out to be very dependent on tuples being clustered in
a certain order. We had an 80M-row table with two columns, keyword and id.
The table happened to be predominantly in keyword order, physically.
Correlation from pg_stats on this table was around 0.90. Throughout the day,
searches would read thousands or tens of thousands of tuples in keyword order.
Life was good.
This weekend, we rebuilt this table, but rebuilt it in ID order. When we
rolled the table out, our performance tanked. Reading thousands of tuples in
keyword order required thousands of seeks throughout the table rows. It
crushed performance. Turns out correlation on the keyword column went down to
about 0.03. Re-clustering the table fixed our performance problem.
And, it's not just this one table. We have about 15 of these tables. As they
get updated, we want to make sure that the correlation on the keyword column in
all these tables never gets below, say, 0.90, and check_postgres seems like the
ideal tool to monitor this.
The documentation claims for query_time
:
The values for the --warning and --critical options are amounts of time, and default to '2 minutes' and '5 minutes' respectively.
Actually, there doesn't appear to be any default. If you leave the options off, you get
ERROR: Invalid argument for 'critical' options: must be an integer, time or integer for time
It appears that this might possibly have been broken in commit 1ceb887.
(Personally, I like having no default better. But of course the documentation or the code should be corrected either way.)
ID: 97
Version: unspecified
Date: 2012-01-06 20:59 EST
Author: Ryan Kelly ([email protected])
check_postgres --command=sequence
errors with:
ERROR: ERROR: cannot access temporary tables of other sessions
This is when run as the 'postgres' user.
uname -a
Linux prodsql 2.6.38-11-virtual #50-Ubuntu SMP Mon Sep 12 21:51:23 UTC 2011
x86_64 x86_64 x86_64 GNU/Linux
psql -c 'select version();'
PostgreSQL 9.0.5 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.5.real
(Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 64-bit
check_postgres --version
check_postgres version 2.18.0
When doing a --reverse check with custom_query and using both warning and critical values, the result is always OK.
This is caused by the sub validate_range () which returns no values back to custom_query.
I fixed it with the following lines:
2187 if (length $warning and length $critical and $warning > $critical) { {
2188 # Original
2189 #return if $opt{reverse};
2190 # Option 1, following checks won't get executed
2191 #return ($warning,$critical) if $opt{reverse};
2192 # Option 2, break out of the if statement, needs another { }
2193 last if $opt{reverse};
2194 ndie msg('range-warnbig');
2195 } }
As I'm not a programmer, could you please review and if ok, include it in the next release.
Cheers
Tobias
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.