GithubHelp home page GithubHelp logo

bahtiyur / librenms-example-alerts Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zimmertr/librenms-example-alerts

0.0 0.0 0.0 947 KB

Collection of my custom LibreNMS alerts & templates

License: GNU General Public License v3.0

librenms-example-alerts's Introduction

LibreNMS Example Alerts

LibreNMS is not very plentiful with their example alerts. It took me a significant amount of time to come up with the following alerts so I thought I would create a repository to retain them in the event that I need to rebuild my LibreNMS server in the future. Or, of course, so others could see my examples as well.

Included alerts

This table holds all of my example LibreNMS alerts.

# Name Rule Severity Extra
1 Host has been down for 5 minutes. %macros.device_down = "1" critical Max: 1 Delay: 300 Interval: 1800
2 CPU usage > 90% for 5 minutes. %processors.processor_usage >= "90" warning Max: 1 Delay: 300 Interval: 900
3 Memory usage > 90% for 5 minutes. %mempools.mempool_perc >= "90" warning Max: 1 Delay: 300 Interval: 900
4 Root filesystem has less than 5% free space available. %storage.storage_descr = "/" && %storage.storage_perc >= "95" critical Max: 1 Delay: 60 Interval: 900
5 Disk usage is abnormally high. %storage.storage_perc > %storage.storage_perc_warn && %devices.type = "server" && %storage.storage_descr !~ "/boot" = "" critical Max: 1 Delay: 60 Interval: 900
6 AUTHENTICATION FAILURE!!!!!!! %syslog.msg ~ "@authentication failure@" = %syslog.timestamp >= %macros.past_5m critical Max: -1 Delay: 300 Interval: 300
7 Network usage > 80% for 5 minutes. %macros.port_usage_perc >= "80" && %port.port_descr_type != "client" && %ports.ifType != "softwareLoopback" ok Max: -1 Delay: 300 Interval: 300
8 CPU Latency > 500ms for 1 minute. %device_perf.avg >= "1000" critical Max: 1 Delay: 60 Interval: 900
9 Network usage > 2MBps for 5 minutes. %ports.ifHighSpeed >= "2" ok Max: 1 Delay: 300 Interval: 900
10 Device discovered within the last 60 minutes %eventlog.type = "discovery" && %eventlog.message ~ "@autodiscovered@" && %eventlog.datetime >= %macros.past_60m ok Max: 1 Delay: 0 Interval: 300
11 Host has been rebooted. %devices.uptime < "300" && %macros.device = "1" warning Max: 1 Delay: 300 Interval: 300
12 Poller is taking longer than expected. %pollers.time_taken >= "250" critical Max: -1 Delay: 300 Interval: 300
13 A network port has gone down. %ports.ifOperStatus = "down" && %ports.ifOperStatus_prev = "up" && %macros.device_up = "1" warning Max: -1 Delay: 300 Interval: 300
14 Interface errors rate is abnormally high. %ports.ifOutErrors_rate >= "100" critical Max: -1 Delay: 300 Interval: 300
15 Host has entered a Warning state. %services.service_status = "1" warning Max: -1 Delay: 300 Interval: 300
16 Host has entered a Critical state. %services.service_status = "2" critical Max: -1 Delay: 300 Interval: 300
17 FQDN does not include the domain name. %devices.sysName !~ "@sol.milkyway" warning Max: -1 Delay: 300 Interval: 300
18 Hostname includes the domain name. %devices.hostname ~ "@.sol.milkyway" warning Max: -1 Delay: 300 Interval: 300
19 IP Address is within the DHCP scope. %devices.ip ~ "192.168.1.2@" warning Max: -1 Delay: 300 Interval: 300
20 Host has not been polled within the last day. %devices.last_polled >= "86400" critical Max: -1 Delay: 300 Interval: 300
21 Host is using a slow network adapter. %ports.port_descr_speed > "1000" ok Max: -1 Delay: 300 Interval: 300
22 CPU ready time > 10% for 5 minutes. %processes.cputime >= "10" critical Max: -1 Delay: 300 Interval: 900
23 Host has less than 1GB of allocated memory. %vminfo.vmwVmMemSize <= "1024" warning Max: -1 Delay: 300 Interval: 300
24 Host has more than 3 VCPUs. %vminfo.vmwVmCpus >= "3" warning Max: -1 Delay: 300 Interval: 300

Alert Template

LibreNMS allows you to customize the alert message that is sent to your transport endpoint. The following is my default message. I did not bother with additional templates.

Alert Title: LibreNMS (%hostname) - NEW ALERT
Recovery Title: LibreNMS (%hostname) - CANCELLATION
Alert Body:

{if %state == 0}Duration: %elapsed{else}Severity: %severity{/if}

{if %name}%name{else}%rule{/if}
---------------------------
Timestamp: %timestamp
Uptime: %uptime_long

TODO

At this time several of the above rules are either not working or flawed. I have not done any troubleshooting, but simply disabled the rules in favor of resolving the issue at a later time.

 - Host has more than 3 VCPUs: This rule does not work effectively as it spams for physical devices > 3 CPU cores as well. (My hypervisor, workstation, & laptop)  
 - Host has less than 1GB of allocated memory: This rule is ineffective as well as it mis-diagnoses physical devices. (Perhaps they return null for vminfo which triggers a false positive?)  
 - FQDN does not include the domain name: For some reason this rule is also not functional. (Every device that is monitored returns a failure for this check during each interval cycle.)  
 - Host has been down for 5 minutes: Times are not always accurate. Also delivers false positives for reboots occasionally. For example, rebooted my laptop (~45 seconds) and received an alert saying it had been down for 5 minutes.

Alert Example

The following is a screenshot of my phone displaying several push notifications using the above rules and templates in conjunction with Pusbullet as my transport endpoint.

librenms-example-alerts's People

Contributors

zimmertr avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.