GithubHelp home page GithubHelp logo

zalando / zmon Goto Github PK

View Code? Open in Web Editor NEW
356.0 64.0 46.0 5.66 MB

Real-time monitoring of critical metrics & KPIs via elegant dashboards, Grafana3 visualizations & more

Home Page: https://demo.zmon.io/

License: Other

Shell 94.51% Dockerfile 0.97% TSQL 4.52%
monitoring metrics

zmon's Introduction

ZMON source code on GitHub is no longer in active development. Zalando will no longer actively review issues or merge pull-requests.

ZMON is still being used at Zalando and serves us well for many purposes. We are now deeper into our observability journey and understand better that we need other telemetry sources and tools to elevate our understanding of the systems we operate. We support the OpenTelemetry initiative and recommended others starting their journey to begin there.

If members of the community are interested in continuing developing ZMON, consider forking it. Please review the licence before you do.

Documentation Status

ZMON

ZMON is Zalando's open-source platform monitoring tool, used in production since early 2014. It supports our many engineering teams in observing their services and metrics on various layers, from low level system metrics to team's business KPIs.

Demo

Head over to demo.zmon.io to take a quick peek into the UI including Grafana3 (login first).

Introduction

To get familiar with the ideas behind ZMON and how things work, you can take a quick dive in: Intro

Talks / Blog

Take a look at the slides from our talk at the DevOps Ireland meetup for background information on ZMON.

First post about ZMON: Monitoring the platform

Features

  • Define checks as data sources executed on self-defined entities
  • Define alerts on checks and entities, with thresholds, as it suits your and your teams needs
  • Define custom dashboards with widgets and alert filters based on teams and tags
  • Check commands and alert conditions are arbitrary Python expressions, giving you a lot of power
  • All metric/check data is stored as time series in KairosDB for later use
  • Grafana3 is included, enabling you to build rich data driven dashboards
  • Powerful REST API to integrate nicely into other tools: e.g. cmdb/deploy tools
  • Entity service to store entities of any kind describing your environment
  • Trial run in the UI to develop your checks/alerts with quick feedback
  • Auto discovery of AWS services using ZMON's aws agent and entity service, great for AWS deployments
  • Authentication via OAuth 2 e.g. GitHub
  • Frontend incl. Grafana 3 requires full authentication, no need for VPN. incl. onetime tokens for office TV displays
  • Command line client for easy automation and interaction with the REST API
  • ZMON data service allows you to connect DCs/Regions via HTTP for federated monitoring
  • Supports SQL for PostgreSQL incl. sharded deployments, MySQL, Redis, Scalyr, ...
  • Supports desktop and mobile notifications via Firebase Cloud Messaging
  • More on connectivity here: Check commands

Local demo and single host deployment

We suggest to use docker compose for deploying zmon locally or on a single host:

More here: compose

The docker compose is also the most convient way to setup a development environment.

In cases where docker compose is not an options continue on (or fall back to obsolete vagrant box).

Manual Deployment

You best head for the documentation now: Component overview

Requirements

ZMON relies on a few great open source products to run, which you will need to operate.

  • Redis
  • PostgreSQL
  • Cassandra + KairosDB

This seems to be a lot, but we provide both a Vagrant box and the deployment scripts for our demo host, lowering the bar to get started :)

Components

Frontend / Controller UI and REST API

Scheduler Schedules check/alert execution

Worker Executes check/alert commands and data acquisition

Optional components

Data service Used for distributed monitoring where sites don't share network connectivity other than the Internet.

Metric cache Fast special purpose cache for REST API metric data for ZMON's REST metrics/cloud UI

Vagrant Box (deprecated)

Install a recent Vagrant version (at least 1.7.4) and simply do:

$ vagrant up

Please note that the provisioning process will take some time (~15min) while it downloads the Docker images.

Frontend

https://localhost:8443/

Login with your own GitHub credentials (OAuth redirect).

Grafana

https://localhost:8443/grafana/

You will be able to create/save dashboards.

KairosDB

KairosDB frontend, i.e. for manually query of metrics:

http://localhost:38083/

Issues

  • If single containers do not start up ssh into the vagrant box and run the start.sh script again manually or use the start-services.sh script to restart single components. Later one takes parameters like controller or worker.

Install the Command Line Interface

Use PIP to install the zmon executable from PyPI.

$ pip3 install --upgrade zmon-cli

Use the ZMON CLI to push/create/update entities (hosts, databases, etc.), check definitions and create optional alerts (also possible via UI).

$ zmon entities push examples/entities/local-postgresql.yaml

$ zmon entities push examples/entities/local-scheduler-instance.json

Push your first check definition:

$ zmon check-definitions update examples/check-definitions/zmon-scheduler-rates.yaml

Modify the alert definition to point to the right check id before doing:

$ zmon alert-definitions update examples/alert-definitions/scheduler-rate-too-low.yaml

Build Environment

If you want to compile everything from source, you can do so with our separate "build-env" Vagrant box:

$ cd build-env
$ vagrant up

Thanks

Docker images/scripts used in slightly modified versions are:

  • abh1nav/cassandra:latest
  • wangdrew/kairosdb
  • official Redis and PostgreSQL

Thanks to the original authors!

License

Copyright 2013-2016 Zalando SE

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

zmon's People

Contributors

bocytko avatar hjacobs avatar jan-m avatar jkandasa avatar mohabusama avatar otrosien avatar pitr avatar roskenet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zmon's Issues

Cannot start Cassandra container

Getting this error after fixing issue #1:

==> default: Error response from daemon: Cannot start container 1e848ddcb52e07cf9f0c8d8e05b48047ffeef65d070e010f5d655dc0dae81e34: Error getting container 1e848ddcb52e07cf9f0c8d8e05b48047ffeef65d070e010f5d655dc0dae81e34 from driver devicemapper: Error mounting '/dev/mapper/docker-8:1-262152-1e848ddcb52e07cf9f0c8d8e05b48047ffeef65d070e010f5d655dc0dae81e34' on '/var/lib/docker/devicemapper/mnt/1e848ddcb52e07cf9f0c8d8e05b48047ffeef65d070e010f5d655dc0dae81e34': no such file or directory

Add more elaborate dashboard example to the Vagrant box

vagrant up works fine for me (Ubuntu 15.10 with Vagrant 1.7.4), but the example dashboard only has a very simple sample alert:
screenshot_2015-12-05_21-10-11

We should add some dashboard widgets and more example checks/alerts to have a more convincing first impression ๐Ÿ˜„

Presentation/TV mode for dashboards

In order to squeeze as much information on a tv screen, as possible, we organize different dashboards and grafana boards in framesets.
To waste less space, it would be cool to have a special tv or presentation mode, triggered by a url parameter, that hides all controls, menues, and searchbars. (leaving basically just the alert boxes in the page). Could be useful for grafana boards as well.

Duplicated vagrant startup scripts needed?

The demo and build-env vms have different startup scripts (zmon/vagrant and zmon/build-dev/vagrant). I had a hard time trying out the build-env vm, because of various issues already solved in the demo vm (broken pip3, missing scm-source and broken locale configuration). There are also other differences like start-services.sh, which in the build-env is still targeting the old zmon controller (8080 & http).

This raises the question, whether the duplicated configuration is needed at all. The only crucial difference I see is the build.sh execution in the build-env and the difference in docker image start (local vs. remote image).

So one could imagine having both vms share the same scripts. The build-env would then additionally run the build.sh to build local docker images and use them in the start-services.sh instead of the versions preconfigured for the demo vm. What do you think @hjacobs @Jan-M ?

Fix navigation

When you try to navigate using browser back (several steps), in general, this is possible, but you cannot see in the <title> tag which page that was. It would be helpful to fix this so that the title is something like "ZMON - Edit Alert 123" or similar.

CannotGetJdbcConnectionException: Failed to acquire connection for virtual shard 0 for get_alert_definitions_by_team_and_tag; neste...

https://sentry.stups-test.zalan.do/sentry/zmon-controller/issues/154/

PSQLException: This connection has been closed.
    at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:843)
    at org.postgresql.jdbc4.Jdbc4Connection.getMetaData(Jdbc4Connection.java:54)
    at com.jolbox.bonecp.ConnectionHandle.getMetaData(ConnectionHandle.java:814)
    at io.opentracing.contrib.spring.cloud.jdbc.JdbcAspect.getConnection(JdbcAspect.java:43)
    at sun.reflect.GeneratedMethodAccessor104.invoke
...
(126 additional frame(s) were not displayed)

CannotGetJdbcConnectionException: Failed to acquire connection for virtual shard 0 for get_alert_definitions_by_team_and_tag; nested exception is org.postgresql.util.PSQLException: This connection has been closed.
    at de.zalando.sprocwrapper.proxy.StoredProcedure.execute(StoredProcedure.java:440)
    at de.zalando.sprocwrapper.proxy.SProcProxy.handleInvocation(SProcProxy.java:53)
    at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:84)
    at com.sun.proxy.$Proxy100.getAlertDefinitionsByTeamAndTag
    at org.zalando.zmon.service.impl.AlertServiceImpl.getActiveAlertDefinitionByTeamAndTag(AlertServiceImpl.java:512)
...
(111 additional frame(s) were not displayed)

Failed to acquire connection for virtual shard 0 for get_alert_definitions_by_team_and_tag; nested exception is org.postgresql.util.PSQLException: This connection has been closed.

Add format option to gauge widget

For a value widget, I can use the format option to define how to display the value. As the gauge widget also displays the value textually, I'd like a similar option there.

aws:cloudformation:logical_id tag for Application ELBs

We want to set up a ZMON check/monitor to fetch and aggregate data from CloudWatch.

What need aws:cloudformation:logical_id tag for this for one of our applications. It works but unfortunately only for Classic ELBs. We want to switch types of all the ELBs to ApplicationELB because only they support percentiles in metrics.

We will request adding the aws:cloudformation:logical_id tag from Amazon for ApplicationELBs
so that it can be picked up by ZMON.

Vagrant box error on vagrant provisioning

Hi,
Using vagrant 1.7.2 under Ubuntu, trying to start the vagrant box for evaluate zmon I found the following error

/$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'ubuntu/vivid64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'ubuntu/vivid64' is up to date...
==> default: Setting the name of the VM: ZMON-DEMO
==> default: Clearing any previously set forwarded ports...
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
==> default: Forwarding ports...
    default: 8080 => 38080 (adapter 1)
    default: 8084 => 38084 (adapter 1)
    default: 8085 => 38085 (adapter 1)
    default: 8083 => 38083 (adapter 1)
    default: 22 => 2222 (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2222
    default: SSH username: vagrant
    default: SSH auth method: private key
    default: Warning: Connection timeout. Retrying...
    default: Warning: Remote connection disconnect. Retrying...
    default: 
    default: Vagrant insecure key detected. Vagrant will automatically replace
    default: this with a newly generated keypair for better security.
    default: 
    default: Inserting generated public key within guest...
    default: Removing insecure key from the guest if its present...
    default: Key inserted! Disconnecting and reconnecting using new SSH key...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Setting hostname...
The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!

service hostname start

Stdout from the command:



Stderr from the command:

stdin: is not a tty
Failed to start hostname.service: Unit hostname.service is masked.

Duplicate included/excluded entities

While including/excluding entities in check-definition or alert-definition it would be really nice to have a duplicate button, as you can see from the following screenshot.

zmon-check-definition

Thanks

Zmon for python client

Hi,
when I want to set my Zmon locally up, I get the problem by from zmon_cli.client import Zmon, the No module named zmon_cli.client
. I searched, can not find this file anywhere.

Regards

Add means to see the result of a ran Check definition

It would be really great when normals users could see the result of a Check definition.
Being able to access this information would help us write checks much faster and to get to know the tool a little better.
Right now there seems to be no way for a normal user to analyse the result of a check.

ImportError: No module named 'dns.exception'

Hi,

I have just installed zmon-cli via pip3. It seems to have completed successfully:

[...]
Successfully installed cffi-1.8.3 clickclick-1.2.1 cryptography-1.5.2 dnspython-1.15.0 dnspython3-1.15.0 easydict-1.6 keyring-10.0.2 pycparser-2.16 setuptools-28.6.1 zmon-cli-1.0.57

Running zmon-cli --version fails with the following stack trace:

$ zmon --version
Traceback (most recent call last):
  File "/usr/local/bin/zmon", line 7, in <module>
    from zmon_cli.main import main
  File "/usr/local/lib/python3.5/dist-packages/zmon_cli/main.py", line 3, in <module>
    from zmon_cli.cmds import cli
  File "/usr/local/lib/python3.5/dist-packages/zmon_cli/cmds/__init__.py", line 1, in <module>
    from zmon_cli.cmds.command import cli
  File "/usr/local/lib/python3.5/dist-packages/zmon_cli/cmds/command.py", line 10, in <module>
    from zmon_cli.config import DEFAULT_CONFIG_FILE
  File "/usr/local/lib/python3.5/dist-packages/zmon_cli/config.py", line 7, in <module>
    import zign.api
  File "/usr/local/lib/python3.5/dist-packages/zign/api.py", line 5, in <module>
    import stups_cli.config
  File "/usr/local/lib/python3.5/dist-packages/stups_cli/config.py", line 2, in <module>
    import dns.exception
ImportError: No module named 'dns.exception'

Anything I have done wrong here?

Thanks,
Moritz

Exception when updating a grafana dashboard using Zmon Update

When using Zmon Update, I get an exception at the end of the update command, However the dashboard seems to be updated anyway.

$> zmon graf update <yaml file>                                                                                                  

Updating dashboard XXXXXX ...
 EXCEPTION OCCURRED: 'id'
Traceback (most recent call last):
  File "/usr/local/bin/zmon", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/zmon_cli/main.py", line 9, in main
    cli()
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/click/decorators.py", line 27, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/zmon_cli/cmds/grafana.py", line 47, in grafana_update
    ok(client.grafana_dashboard_url(dashboard))
  File "/usr/local/lib/python3.6/site-packages/zmon_cli/client.py", line 252, in grafana_dashboard_url
    return self.endpoint(GRAFANA_DASHBOARD_URL, dashboard['id'], base_url=self.base_url)
KeyError: 'id'

Support a secure way to pass credentials to http() calls

Background

A third-party API requires us to pass credentials (user ID and secret key) via custom HTTP headers.

At the moment there is no way to specify the credentials without allowing other users to access the credentials.

In #30 a solution is suggested (a proxy in front of the external resource) but users might try to avoid to setup additional resources.

Suggested solution

Via environment variables one could specify the following kind of information:

accounts = [
   {
    placeholder_prefix : "ABC",
    url_pattern        : "https://abc.com/api/.*',
    USER_ID            : "myUserId1",            // should be encryped via AWS KMS
    SECRETKEY          : "r73fhf83g83gdv327dv"   // should be encryped via AWS KMS
   },
   ....
]

The HTTP call in the check would look like:

response = http('https://abc.com/api/v2/status', 
                    replace_credentials_placeholder : True,
                    headers={
                        'X-ABC-INC-ACCESS-ID': 'ACCOUNTS_ABC_USER_ID',
                        'X-ABC-INC-SECRET-KEY': 'ACCOUNTS_ABC_SECRETKEY',
                        })

Then ZMON's http component would replace placeholders in the URL and in the headers before doing the actual call.

Improve hour support

The configuration period: hr {8 - 24} results in:
{ "exception": "8 is not valid for hour. Valid options are between 0 and 23.", ... }

I think whitespace between 8 and minus and 24 is throwing off zmon.

Vagrant box fails to come up due to 404 on box URL

Hey,

Looks like your configured Vagrant image is no longer available:

$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Box 'ubuntu-trusty-14.10-cloudimg' could not be found. Attempting to find and install...
    default: Box Provider: virtualbox
    default: Box Version: >= 0
==> default: Adding box 'ubuntu-trusty-14.10-cloudimg' (v0) for provider: virtualbox
    default: Downloading: https://cloud-images.ubuntu.com/vagrant/utopic/current/utopic-server-cloudimg-amd64-vagrant-disk1.box
An error occurred while downloading the remote file. The error
message, if any, is reproduced below. Please fix this error and try
again.

The requested URL returned error: 404 Not Found

Add support for excluding alerts in dashboards based on alert tags

Today, the dashboard already supports a feature to show alerts with specific tags only. It would be also great to support the exact other way around: hide alerts with specific tags.

A simple and maybe easy way could be to just add a "!" in front of the tags you don't want to see in your dashboard:

screen shot 2016-07-29 at 14 46 41

Question: unexpected behaviour when alert code changes result of check?

Given the alert code modified the result of the check (kind of unusual admittedly),
when the alert code is changed so the result is no longer modified,
and the I cleanup and evaluate the alert,
then the alert still shows the modified result in the UI for a while (probably until the check runs regularily).

Why is it like this? It is a bit confusing.

Data is not collecting

I have installed ZMON according to manual and have imported check-definitons and entities from zmon-demo github, I have also added some custom check-defs and entities, but the only check collecting is default "Random".
zmon cli show all the entites and checks correclty.
On "Check defs" page the only difference between all the check is Team, which is "Example Team" for "Random" and "ZMON" for others.
"ps" always show "zmon-worker check 1 on ..." for all worker process.
I have tried to run "BLPOP zmon:queue:default" in redis-cli and it always return only "...Random..".
If I understand architecture correctly, then zmon-collector is not pushing new checks to redit, but I don't know how to debug it any further.

P.S. https://demo.zmon.io is also not collecting data since 21.03.16.

"ubuntu/vivid64" is gone

When doing vagrant up I get

==> default: Box 'ubuntu/vivid64' could not be found. Attempting to find and install...
    default: Box Provider: virtualbox
    default: Box Version: >= 0
The box 'ubuntu/vivid64' could not be found or
could not be accessed in the remote catalog. If this is a private
box on HashiCorp's Atlas, please verify you're logged in via
`vagrant login`. Also, please double-check the name. The expanded
URL and error message are shown below:

URL: ["https://atlas.hashicorp.com/ubuntu/vivid64"]
Error: The requested URL returned error: 404 Not Found

Can we use "ubuntu/trusty64" or is there something special about "ubuntu/vivid64"?

SSL verification problem in ZMON Scheduler in Vagrant box

Caused by: org.springframework.web.client.ResourceAccessException: I/O error on GET request for "https://localhost:8443/api/v1/checks/all-active-alert-definitions":hostname in certificate didn't match: <localhost> != <unknown>; nested exception is javax.net.ssl.SSLException: hostname in certificate didn't match: <localhost> != <unknown>
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:584)
    at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:529)
    at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:447)
    at de.zalando.zmon.scheduler.ng.alerts.DefaultAlertSource.getCollection(DefaultAlertSource.java:84)
    at de.zalando.zmon.scheduler.ng.alerts.AlertRepository.fill(AlertRepository.java:31)
    at de.zalando.zmon.scheduler.ng.alerts.AlertRepository.<init>(AlertRepository.java:67)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:147)
    ... 52 more
Caused by: javax.net.ssl.SSLException: hostname in certificate didn't match: <localhost> != <unknown>
    at org.apache.http.conn.ssl.AbstractVerifier.verify(AbstractVerifier.java:238)
    at org.apache.http.conn.ssl.BrowserCompatHostnameVerifier.verify(BrowserCompatHostnameVerifier.java:54)
    at org.apache.http.conn.ssl.AbstractVerifier.verify(AbstractVerifier.java:159)
    at org.apache.http.conn.ssl.AbstractVerifier.verify(AbstractVerifier.java:140)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:301)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:291)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:259)
    at org.apache.http.impl.conn.HttpClientConnectionOperator.connect(HttpClientConnectionOperator.java:125)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:319)
    at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:363)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:219)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
    at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:91)
    at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
    at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:53)
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:568)
    ... 62 more

Url configuration option

Docker compose fails to run all checks

Starting with docker-compose causes all the checks to fail to execute and the dashboard is just completely red (due to failed evaluations).
image

This is because the entities (example) are referring to localhost and these aren't reachable from the worker.

[16:19:39] user@machine:~/zmon/compose$ docker exec -it compose_worker_1 bash
root@9bc16fbd1363:/#
root@9bc16fbd1363:/# curl -k https://localhost:8443/health
curl: (7) Failed to connect to localhost port 8443: Connection refused
root@9bc16fbd1363:/#
root@9bc16fbd1363:/# curl -k https://controller:8443/health
{"timestamp":1537712390748,"status":404,"error":"Not Found","message":"Not Found","path":"/health"}
root@9bc16fbd1363:/# 
root@9bc16fbd1363:/# exit
[16:20:00] user@machine:~/zmon/compose$

I think there are a couple of solutions here :

  1. Duplicate entities for docker compose and use services names instead of localhost (to preserve Vagrant stuff)
  2. Use host networking (might be insecure though)

Where is your code?

Hi!

Don't get me wrong, zmon is really cool and I want to ask questions that I would have if I want to use this project.

Zmon is really nice, but how do I get all the repositories?
How to install zmon it?
Can you suggest server size or number of servers?
Where to put what component?
Why should I use it, if I can not understand how to install it?

Best, sandor

Issues with README

Hi guys, here is a couple of "README issues" I'd like to discuss.

  1. "Manual Deployment" section does actually describe the process of manual deployment. It provides a link to component overview instead. It would be better to have a tutorial or instruction explaining how to install the components manually.
  2. Vagrant section is deprecated and should be removed (maybe?)
  3. License section should be updated, it still says "2013-2016", should be "2013-2017"

Docker compose says "registry.opensource.zalan.do/stups/redis:3.2.0-alpine not found"

I tried to install zmon via docker compose and facing the following issue,

docker-compose -f zmon-compose.yaml up --build
......
......
Pulling redis (registry.opensource.zalan.do/stups/redis:3.2.0-alpine)...
Trying to pull repository registry.opensource.zalan.do/stups/redis ... 
ERROR: manifest for registry.opensource.zalan.do/stups/redis:3.2.0-alpine not found

Implement infrastruture discovery agent core

Right now we have multiple infrastructure discovery agents used to sync different third-party system components to ZMON entities (e.g. AWS, Kubernetes, GCP ...). Each agent is standalone, however a lot of code is identical/repeated.

The idea is to implement a core agent which can dynamically load plugin(s) to discover different external system components, and the core syncs them with ZMON backend. That way we enable easier integration points with external systems.

Useful here as well: zalando-zmon/zmon-demo#10

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.