GithubHelp home page GithubHelp logo

valeriansaliou / vigil Goto Github PK

View Code? Open in Web Editor NEW
1.6K 20.0 122.0 2.5 MB

🚦 Microservices Status Page. Monitors a distributed infrastructure and sends alerts (Slack, SMS, etc.).

Home Page: https://crates.io/crates/vigil-server

License: Mozilla Public License 2.0

Rust 86.96% CSS 8.02% JavaScript 1.80% Dockerfile 0.40% Shell 2.82%
microservices infrastructure monitor slack status statuspage servers monitoring infrastructure-services vigil

vigil's Issues

Error trying to do manual reporting

I get this error:

(INFO) - POST /reporter/quivr_web/api application/x-www-form-urlencoded:
(ERROR) - No matching routes for POST /reporter/quivr_web/api application/x-www-form-urlencoded.

When i try running this command:

wget --post-file=./api_request localhost:8080/reporter/quivr_web/api/

where the api_request file contains:

 {
  "replica": "www.quivr.be",
  "interval": 30,
  "load": {
    "cpu": 0.30,
    "ram": 0.80
  }
}

The probe "quivr_web" and it's node "api" are both defined in the vigil config.

[probe]

[[probe.service]]

id = "quivr_web"
label = "Quivr website"

[[probe.service.node]]

id = "api"
label = "quivr api"
mode = "push"

What am I doing wrong/how can I fix this?

Do i need the rabbitmq plugin to use manual reporting?

Install errors

getting this error when i try to install

[root@status ~]# cargo install vigil-server
    Updating registry `https://github.com/rust-lang/crates.io-index`
  Installing vigil-server v1.1.2
   Compiling serde v1.0.27
error: the optimizations s or z are only accepted on the nightly compiler

error: failed to compile `vigil-server v1.1.2`, intermediate artifacts can be found at `/tmp/cargo-install.P2pvoDcsgJnE`

Caused by:
  Could not compile `serde`.

To learn more, run the command again with --verbose.
[root@status ~]#

Customize request URL

fn proceed_replica_probe_http(url: &str, body_match: &Option<Regex>) -> bool {
    let url_bang = format!("{}?{}", url, time::now().to_timespec().sec);

Nice project, I have a couple of backend services that don't like having extraneous query params so I can't use them currently. I can understand you want to cache break the requests but this implementation prevents using requests with query params as well

Docker location

i just tried to install vigil on docker and i cannot find the location of where it intalled vigil too.

RSS Feed

I usually use the FEED (as a user) to monitor 3rd party servers on Slack!

Custom probes (shell scripting)

Add a custom probe type, where a Vigil user can program a check script in Lua shell.

The script can be used to do custom network check, like POST data to an HTTP API and compare response body with a reference template. It can also be used to build a sequence of checks that must all pass for the node to be healthy.

The Lua interpreter is lightweight and can be embedded in Vigil with a low overhead on the total Vigil binary size.
(One alternative is to embed V8 and allow for JS scripts/workers, but it’s super heavy to embed; where Lua would be good-enough for custom probes)

Reason for change: the Lua runtime does not provide any HTTP library by default, and even if V8 does, it adds a ~35MB compiled overhead to the Vigil binary, which is unacceptable.

Auto refresh of the status page.

I moved to Vigil from another amazing monitorwall Monitoror as wanted to keep track of a lot of servers not only visually but with alerts too as those VMs having acceptance testing backed into those and exposed through a http endpoint. All seems great in case of Vigil except its status page doesn't refresh automatically like it happens in case of the Monitoror. Could the auto refresh be added here as well? Thanks!

Error installing from Docker

I have followed the README.me instructions:

sudo docker pull valeriansaliou/vigil:v1.9.0

The path to config.cfg is valid, but this error comes when executing the run command:

sudo docker run -p 8080:8080 -v /.../config.cfg -e RUST_BACKTRACE=1 valeriansaliou/vigil:v1.9.0

thread 'main' panicked at 'cannot find config file: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/libcore/result.rs:997:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39
   1: std::panicking::default_hook::{{closure}}
             at src/libstd/sys_common/backtrace.rs:70
             at src/libstd/sys_common/backtrace.rs:58
             at src/libstd/panicking.rs:200
   2: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:215
             at src/libstd/panicking.rs:478
   3: std::panicking::continue_panic_fmt
             at src/libstd/panicking.rs:385
   4: rust_begin_unwind
             at src/libstd/panicking.rs:312
   5: core::panicking::panic_fmt
             at src/libcore/panicking.rs:85
   6: core::result::unwrap_failed
   7: std::sync::once::Once::call_once::{{closure}}
   8: std::sync::once::Once::call_inner
             at src/libstd/sync/once.rs:387
   9: <vigil::APP_CONF as core::ops::deref::Deref>::deref
  10: vigil::main
  11: std::rt::lang_start::{{closure}}
  12: main
  13: __libc_start_main
  14: _start

Any idea what can be wrong?

Disk usage reporting

Disk space usage seems to be a really basic metric to probe.

Some of services are often victims of full disk and that risk could be mitigated by allowing Vigil to report such situations as unhealthy above some percentage of disk usage.

Probably worth handling that?

Support Typetalk?

Hi, how about adding notifier for Typetalk ?
If the suggestion is acceptable, I'd love to send PR. πŸš€

IPv6 ICMP probing support

Due to the replacement of fastping-rs with ping, ICMPv6 support might have been dropped on some platforms.

A fork of the unmaintained ping library, and a full rework into a cleaner library would be much needed, adding support for ICMP IPv6 on all platforms.

Vigil fails on compile when using Dockerfile

Hello @valeriansaliou ,

I'm trying to deploy vigil using Dockerfile copied from this repo but whatever I do it fails on openssl, for example:

Compiling aho-corasick v0.7.6                                                                                                                                                             
error: failed to run custom build command for `openssl v0.9.24`                                                                                                                              
process didn't exit successfully: `/app/target/release/build/openssl-e0c9d0620fbd0538/build-script-build` (exit code: 101)                                                                   
--- stderr                                                                                                                                                                                   
thread 'main' panicked at 'Unable to detect OpenSSL version', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/openssl-0.9.24/build.rs:16:14                                        
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.                                                                                                               
                                                                                                                                                                                             
warning: build failed, waiting for other jobs to finish...                                                                                                                                   
error: build failed 

The Dockerfile, Cargo.toml and Cargo.lock have been copied from this repo.

Do you have any idea how to fix that?

Thank you in advance.

Best regards,
Denis

Using Docker build, prober not receiving correct response

Running vigil using

docker run --rm -p 6080:8080 -v $(pwd)/vigil.cfg:/etc/vigil.cfg valeriansaliou/vigil:v1.9.0

gets following result

(DEBUG) - will probe replica: HTTPS("https://status.crisp.chat/robots.txt") with retry count: 2
(DEBUG) - prober poll will fire for http target: https://status.crisp.chat/robots.txt?1560398131
(DEBUG) - resolving host="status.crisp.chat"
(DEBUG) - connecting to 178.62.224.146:443
(DEBUG) - adding I/O source: 4194304
(DEBUG) - scheduling Write for: 0
(DEBUG) - connected to Some(V4(178.62.224.146:443))
(DEBUG) - scheduling Read for: 0
(DEBUG) - scheduling Read for: 0
(DEBUG) - dropping I/O source: 0
(DEBUG) - prober poll result was not received for url: https://status.crisp.chat/robots.txt?1560398131

this is my vigil.cfg

# Vigil
# Microservices Status Page
# Configuration file
# Example: https://github.com/valeriansaliou/vigil/blob/master/config.cfg


[server]

log_level = "debug"
inet = "0.0.0.0:8080"
workers = 4
reporter_token = "REPLACE_THIS_WITH_A_SECRET_KEY"

[assets]

path = "./res/assets/"

[branding]

page_title = "Crisp Status"
page_url = "https://status.crisp.chat/"
company_name = "Crisp IM SARL"
icon_color = "#3C82E7"
icon_url = "https://valeriansaliou.github.io/vigil/images/crisp-icon.png"
logo_color = "#3C82E7"
logo_url = "https://valeriansaliou.github.io/vigil/images/crisp-logo.svg"
website_url = "https://crisp.chat/"
support_url = "mailto:[email protected]"
custom_html = ""

[metrics]

poll_interval = 30
poll_retry = 2

poll_http_status_healthy_above = 200
poll_http_status_healthy_below = 400

poll_delay_dead = 20
poll_delay_sick = 10

push_delay_dead = 20

push_system_cpu_sick_above = 0.90
push_system_ram_sick_above = 0.90

[plugins]

[plugins.rabbitmq]

api_url = "http://127.0.0.1:15672"
auth_username = "rabbitmq-administrator"
auth_password = "RABBITMQ_ADMIN_PASSWORD"
virtualhost = "crisp"

queue_ready_healthy_below = 500
queue_nack_healthy_below = 100
queue_loaded_retry_delay = 500

[notify]

reminder_interval = 300

[notify.email]

from = "[email protected]"
to = "[email protected]"

smtp_host = "localhost"
smtp_port = 587
smtp_username = "user-access"
smtp_password = "user-password"
smtp_encrypt = false

[notify.twilio]

to = [
  "+336xxxxxxx",
  "+337xxxxxxx"
]

service_sid = "service-sid"
account_sid = "account-sid"
auth_token = "auth-token"

reminders_only = true

[notify.slack]

hook_url = "https://hooks.slack.com/services/xxxx"

[notify.xmpp]

from = "[email protected]"
to = "[email protected]"

xmpp_password = "xmpp-password"

[probe]

[[probe.service]]

id = "web"
label = "Web nodes"

[[probe.service.node]]

id = "status"
label = "Access to status page"
mode = "poll"
replicas = ["https://status.crisp.chat/robots.txt"]
http_body_healthy_match = "User-agent:.*"

Timezone support

While I do try to stick to UTC as a timezone, I think Vigil would do a better job if it had the possibility of changing timezones.

As this isn't in the documentation, I assumed there's no possibility to change timezones.

A few ideas here:

  • Timezone should change depending on the visitor (not sure if there's a standard way to do this, but perhaps https://stackoverflow.com/questions/6939685/get-client-time-zone-from-browser that should help).
  • There should be a way to set up a different timezone for notifications (if the administrator lives in Beijing, they probably don't care about UTC notifications).
  • Lastly, if timezone cannot be detected, it should resort to UTC (as opposed to resorting to the administrator's timezone).

Auto-Refresh when general status changes

Fetch & update the page DOM via JS w/o performing a full page reload; automatically when the status changes.

Poll the current status every 30s from Vigil (get the status code only), and request a full page refresh if the status changed relative to current status.

Update year in copyright notice

I'm running the latest version of Vigil in Docker and the copyright notice at the bottom of my status page says 2018 instead of 2019. Is this something you could fix in the next version?

vivaldi_2019-06-27_22-41-49

Execute actions based on health changes

What do you think about having an option to execute actions based on health changes? It could be just as simple as executing a power shell script (let's say, to restart a service).

I guess for now there are ways to do this using slack integrations but could be nice to have it directly in vigil.

RabbitMQ connexion

Hi,

I tried to use rabbitmq plugins, but the probe.service.node not show any replica

[[probe.service.node]]

id = "ai.ocr"
label = "OCR Queued Messages"
mode = "push"
rabbitmq_queue = "ocr_process_dev"

Capture d’écran 2020-03-24 aΜ€ 17 27 01

Thx

cargo configuration breaks non-nightly builds

Using cargo 0.25.0 with both rustc 1.25.0 and rustc 1.24.1, the documented install instructions fail, in whatever deep dependency is tried first, because:

error: the optimizations s or z are only accepted on the nightly compiler

My rust knowledge is almost non-existent, but as far as I can tell, this is the vigil project's Cargo.toml having, in [profile.release], opt-level = "s".

Service Discovery

Would be great to have support for Service Discovery (DNS SRV records) such as used by consul, Eureka, etc.

replicas (type: array[string], allowed: TCP or HTTP URLs, default: empty) β€” Node replica URLs to be probed (only used if mode is poll)

At present I can only list containerized services I have exposed on the load balancer which is only a small subset of services that are running as the rest talk to each other directly. Also because they are proxied there is only one replica for Vigil even though there might be a half dozen instances running so having count as 1 is not ideal.

Show replica name in tooltip when hovering replica item (opt-in option)

First of all, thanks so much for vigil. I have tried so many alternatives and vigil is the only one that works consistently and works best.

I do have a feature request however. It would be nice to be able to mark or show somehow which replica's is which on the web page. Doesn't have to give away sensitive info, could be a label or anything really.

If this is by design let me know or already possible then just ignore me :D I think it may be useful sometimes to know in automated workflows what got added/removed or perhaps you don't want to alert but curious which replica has unusual latency (which I'm not sure why some replica's show latency and others none in that popup).

Caddy reverse proxy

I'm not very experienced with Caddy, because for now, I've always resorted to simply using Nginx, however I'm finding some difficulties when setting up Caddy and Vigil.

To check that it wasn't Caddy's fault, I have also tried to reverse proxy other webpages, and everything seemed to work perfectly fine.

And to be perfectly clear, serving Vigil over port 8080 (HTTP only) also works perfectly fine, both dialing the IP as well as using the hostname (although most web browsers refuse to connect because HSTS is enforced, I checked this using cURL which doesn't observe HSTS).

I use the following settings in vigil.cfg to serve on port 8080:

inet = "0.0.0.0:8080"

Initially, my Caddyfile looked like this:

status.hostname.com

reverse_proxy / {
	to http://localhost:8080
}

(I replaced my actual domain with hostname.com).

According to Caddy's documentation, it passes on all original headers unmodified to the upstream server. Since I suspected there may be an issue with the Host header, I modified the Caddyfile as so:

status.hostname.com

reverse_proxy / {
	to http://localhost:8080
	header_up Host localhost
	header_up -Upgrade-Insecure-Requests
	header_up -Pragma
	header_up -Cache-Control
}

However, it unfortunately still does not work, and I'm not sure at this point whether or not this is my fault, Caddy's, or Vigil's.

The relevant fragment of logs from Caddy looks like this:

(DEBUG) - Incoming stream
(DEBUG) - Request Line: Get AbsolutePath("/status/text/") Http11
(DEBUG) - Headers { Host: aa.bb.cc.dd:8080
, Connection: keep-alive
, User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
, DNT: 1
, Accept: */*
, Referer: http://aa.bb.cc.dd:8080/
, Accept-Encoding: gzip, deflate
, Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
, }
(INFO) - GET /status/text/:
(INFO) - Matched: GET /status/text (status_text)
(INFO) - Outcome: Success
(DEBUG) - writing head: Http11 Ok
(DEBUG) - headers [
Headers { Content-Type: text/plain; charset=utf-8
, Server: Rocket
, Content-Length: 7
, Date: Mon, 06 Apr 2020 17:57:41 GMT
, }]
(DEBUG) - write 7 bytes
(INFO) - Response succeeded.
(DEBUG) - keep_alive = true for xx.yy.zz.ww:58157
(DEBUG) - ioerror in keepalive loop = Custom { kind: UnexpectedEof, error: "end of stream before headers finished" }
(DEBUG) - keep_alive loop ending for xx.yy.zz.ww:58157

Deploying in ZEIT Now V2

I'm not able to deploy vigil on ZEIT Now. The now.json is:

{ "version": 2, "name": "vigil", "builds": [ { "src": "Cargo.toml", "use": "@now/rust", "config": { "rust": "nightly" } } ] }

and fails with code 101:
https://pastebin.com/yZMrBJxE

Any suggestions? Thanks!

DNS probe

Feature request, add ability to check availability and validity of responses of a DNS server.
Send predefined query to specified endpoint and validate that response contains expected record.

In my use case that would be SOA dn42 query sent to recursive resolver with the expectation to see one SOA record in response.

Issue with simple https endpoint

Hello !

The endpoint below yield a 200 status code but I still get an error in vigil.

[[probe.service.node]]
id = "redsmin-api"
label = "API"
mode = "poll"
replicas = ["https://api.redsmin.com"]

Any idea on what is going wrong? :)

Vigil reporting incorrect outage

I'm trying this out locally to replace our limited home-spun solution.

I have it working fine for a system which returns a JSON body, but another one which simply returns "ok" is not behaving.

The config that fails is:

[[probe.service.node]]
id = "block.veoo.com"
label = "block.veoo.com access"
mode = "poll"
replicas = ["https://block.veoo.com/health_check"]

This URL returns with GET:

HTTP/1.1 200 OK
Content-Type: text/plain; charset=utf-8
Content-Length: 2
Date: Tue, 29 Jan 2019 23:21:46 GMT
Connection: keep-alive

ok

So I'm not sure why it's reporting as failed. Any ideas?

Getting errors when trying to build

Hi, I am having 2 errors when trying to build with cargo, I am using rust nightly-2018-12-14 as instructed.

The errors are:


- imports can only refer to extern crate names passed with `--extern` on stable channel (see issue #53130)
- use of unstable library feature 'duration_as_u128' (see issue #50202)

Am I doing something wrong? (I have never used cargo so I don't really know how to troubleshoot, but if you need any more info, let me know)

Persistent states across restarts

Hi πŸ‘‹

Great project!

It would be great if the last state of a probe/replica is stored somewhere persistent to make sure it doesn't trigger an obsolete notification (which was already sent when the service actually went down) again upon next start.

Installation Failed with Regex-Syntax

Hi there,

Operating System: Linux 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux
Rustc Version: rustc 1.36.0-nightly (37ff5d388 2019-05-22)
Cargo Version: cargo 1.36.0-nightly (c4fcfb725 2019-05-15)

I'm trying to install Vigil using cargo install vigil-server, but the installation process fails with the following error:

   Compiling regex-syntax v0.6.6
error: failed to compile `vigil-server v1.9.0`, intermediate artifacts can be found at `/tmp/cargo-installWlu03T`

Caused by:
  Could not compile `regex-syntax`.

Caused by:
  process didn't exit successfully: `rustc --crate-name regex_syntax /root/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-syntax-0.6.6/src/lib.rs --color always --crate-type lib --emit=dep-info,metadata,link -C opt-level=s -C metadata=ccda77ecd63c6781 -C extra-filename=-ccda77ecd63c6781 --out-dir /tmp/cargo-installWlu03T/release/deps -L dependency=/tmp/cargo-installWlu03T/release/deps --extern ucd_util=/tmp/cargo-installWlu03T/release/deps/libucd_util-f229009622233513.rlib --cap-lints allow` (signal: 9, SIGKILL: kill)

Many thanks!

Kind regards,
Casper.

Prometheus metrics

The project looks really nice but my main alerting and monitoring solution is Prometheus. I was thinking about introducing Vigil as a monitoring component due to the fact that blackbox_exporter (Prometheus own exporter to monitor HTTP / TCP services is hard to setup, needs a lot of relabelling etc). So if vigil would be scraping and collecting info about which service is down or not and would expose that in Prometheus metrics format it would be great.

So I`m just curious if you have something like that in mind?

Show node CPU load, RAM used or latency on status page

When a node is shown as under high load (orange color), we don't show why it was reported as being loaded or slow.

If the node is a pull node, show in the hover tooltip:

  1. TCP or HTTP latency (milliseconds)

If the node is a push node, show in the hover tooltip:

  1. CPU load (percent)
  2. RAM used (percent)
  3. RabbitMQ queued + rejected packets (count) [if RabbitMQ plugin is active]

Add ability to check probe output for arbitrary string value

One of the greatest features I see implemented by comparable services is the ability to check the output of the probe sent to an HTTP endpoint for arbitrary strings instead of just HTTP response codes. I would love have this feature implemented for Vigil.

Install failed - Seems to be issue with lettre_email package.

I'm using the rust nightly:

rustc 1.39.0-nightly (521d78407 2019-08-25)

Get the following error during install (using cargo install vigil-server):

error[E0283]: type annotations required: cannot resolve `std::string::String: std::convert::AsRef<_>`
   --> /home/webadmin/.cargo/registry/src/github.com-1ecc6299db9ec823/lettre_email-0.8.3/src/lib.rs:387:70
    |
387 |         self.add_header(("Content-Type", format!("{}", content_type).as_ref()));
    |                                                                      ^^^^^^

error: aborting due to previous error

For more information about this error, try `rustc --explain E0283`.
error: Could not compile `lettre_email`.
warning: build failed, waiting for other jobs to finish...

Problem start with docker

Hello,
I have a small problem with start vigil with docker

I have this error: thread 'vigil-responder' panicked at 'Cannot assign requested address (os error 99)', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/rocket-0.4.2/src/error.rs:192:17

Thanks you

Vigil causes error messages to show up in Caddy's logs

I'm currently hosting my sites behind the reverse proxy Caddy and I'm using Vigil to poll the status of these sites as HTTP targets. Now, the problem is that whenever Vigil polls a site, the following error message shows up in Caddy's logs:

2018/12/19 23:14:51 Unsolicited response received on idle HTTP channel starting with "0\r\n\r\n"; err=<nil>

The error message appears to be harmless in the sense that both Vigil and Caddy work fine in spite of it, but I still think that it's something that should be looked into because something seems to be wrong with the HTTP request that Vigil sends out when it polls a site.

I found some information about this error message on this Golang issue.

Disabled IPv6 on System, now geting panic on vigil-prober

I disabled IPv6 on my server, and after rebooting vigil does not want to start, giving the following error...

thread 'vigil-prober' panicked at 'failed to create icmp pinger: "Address family not supported by protocol (os error 97)"', src/libcore/result.rs:1192:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

My vigil config file:

# Vigil
# Microservices Status Page
# Configuration file
# Example: https://github.com/valeriansaliou/vigil/blob/master/config.cfg


[server]

log_level = "info"
inet = "0.0.0.0:8080"
workers = 4
reporter_token = "REPLACE_THIS_WITH_A_SECRET_KEY"

[assets]

path = "./res/assets/"

[branding]
...
[metrics]

poll_interval = 120
poll_retry = 2

poll_http_status_healthy_above = 200
poll_http_status_healthy_below = 400

poll_delay_dead = 30
poll_delay_sick = 10

push_delay_dead = 20

push_system_cpu_sick_above = 0.90
push_system_ram_sick_above = 0.90

[plugins]
...

[probe]

[[probe.service]]

id = "network"
label = "Network"

[[probe.service.node]]

id = "router"
label = "Router"
mode = "poll"

replicas = [
  "icmp://router.mydomain.com"
]

As far as I can tell there is no way in the configuration to specify only using IPv4 ICMP polling, so I am stuck at this point. Any help would be appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.