GithubHelp home page GithubHelp logo

datadog / docker-dd-agent Goto Github PK

View Code? Open in Web Editor NEW
302.0 67.0 192.0 2.55 MB

Datadog Agent Dockerfile for Trusted Builds.

Home Page: https://registry.hub.docker.com/u/datadog/docker-dd-agent/

License: MIT License

Shell 24.29% Python 29.72% Roff 37.29% Dockerfile 8.70%

docker-dd-agent's Introduction

Datadog Agent 5.x Dockerfile

This repository is meant to build the base image for a Datadog Agent 5.x container. You will have to use the resulting image to configure and run the Agent. If you are looking for a Datadog Agent 6.x Dockerfile, it is available in the datadog-agent repo.

Quick Start

The default image is ready-to-go. You just need to set your API_KEY in the environment.

docker run -d --name dd-agent \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
  -e API_KEY={your_api_key_here} \
  -e SD_BACKEND=docker \
  -e NON_LOCAL_TRAFFIC=false \
  datadog/docker-dd-agent:latest

If you are running on Amazon Linux with version < 2, use the following instead:

docker run -d --name dd-agent \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /cgroup/:/host/sys/fs/cgroup:ro \
  -e API_KEY={your_api_key_here} \
  -e SD_BACKEND=docker \
  -e NON_LOCAL_TRAFFIC=false \
  datadog/docker-dd-agent:latest

Configuration

Hostname

By default the agent container will use the Name field found in the docker info command from the host as a hostname. To change this behavior you can update the hostname field in /etc/dd-agent/datadog.conf. The easiest way for this is to use the DD_HOSTNAME environment variable (see below).

CGroups

For the Docker check to succeed, memory management by cgroup must be enabled on the host as explained in the debian wiki. On Debian Jessie or later for example you will need to add cgroup_enable=memory swapaccount=1 to your boot options, otherwise the agent won't be able to recognize your system. See this thread for details.

Autodiscovery

The commands in the Quick Start section enable Autodiscovery in auto-conf mode, meaning the Agent will automatically run checks against any containers running images listed in the default check templates.

To learn more about Autodiscovery, read the Autodiscovery guide on the Datadog Docs site. To disable it, omit the SD_BACKEND environment variable when starting docker-dd-agent.

Environment variables

Some configuration parameters can be changed with environment variables:

  • DD_HOSTNAME set the hostname (write it in datadog.conf)

  • TAGS set host tags. Add -e TAGS=simple-tag-0,tag-key-1:tag-value-1 to use [simple-tag-0, tag-key-1:tag-value-1] as host tags.

  • EC2_TAGS set EC2 host tags. Add -e EC2_TAGS=yes to use EC2 custom host tags. Requires an IAM role associated with the instance.

  • LOG_LEVEL set logging verbosity (CRITICAL, ERROR, WARNING, INFO, DEBUG). Add -e LOG_LEVEL=DEBUG to turn logs to debug mode.

  • DD_LOGS_STDOUT: set it to yes to send all logs to stdout and stderr, for them to be processed by Docker.

  • PROXY_HOST, PROXY_PORT, PROXY_USER and PROXY_PASSWORD set the proxy configuration.

  • DD_URL set the Datadog intake server to send Agent data to (used when using an agent as a proxy )

  • NON_LOCAL_TRAFFIC configures the non_local_traffic option in the agent which enables or disables statsd reporting from any external ip. You may find this useful to report metrics from your other containers. See network configuration for more details. This option is set to true by default in the image, and the docker run command we provide in the example above disables it. Remove the -e NON_LOCAL_TRAFFIC=false part to enable it back. WARNING if you allow non-local traffic, make sure your agent container is not accessible from the Internet or other untrusted networks as it would allow anyone to submit metrics to it.

  • SD_BACKEND, SD_CONFIG_BACKEND, SD_BACKEND_HOST, SD_BACKEND_PORT, SD_TEMPLATE_DIR, SD_CONSUL_TOKEN, SD_BACKEND_USER and SD_BACKEND_PASSWORD configure Autodiscovery (previously known as Service Discovery):

    • SD_BACKEND: set to docker (the only supported backend) to enable Autodiscovery.
    • SD_CONFIG_BACKEND: set to etcd, consul, or zk to use one of these key-value stores as a template source.
    • SD_BACKEND_HOST and SD_BACKEND_PORT: configure the connection to the key-value template source.
    • SD_TEMPLATE_DIR: when using SD_CONFIG_BACKEND, set the path where the check configuration templates are located in the key-value store (default is datadog/check_configs)
    • SD_CONSUL_TOKEN: when using Consul as a template source and the Consul cluster requires authentication, set a token so the Datadog Agent can connect.
    • SD_BACKEND_USER and SD_BACKEND_PASSWORD: when using etcd as a template source and it requires authentication, set a user and password so the Datadog Agent can connect.
  • DD_APM_ENABLED run the trace-agent along with the infrastructure agent, allowing the container to accept traces on 8126/tcp (This option is NOT available on Alpine Images)

  • DD_PROCESS_AGENT_ENABLED run the process-agent along with the infrastructure agent, feeding data to the Live Process View and Live Containers View (This option is NOT available on Alpine Images)

  • DD_COLLECT_LABELS_AS_TAGS Enables the collection of the listed labels as tags. Comma separated string, without spaces unless in quotes. Exemple: -e DD_COLLECT_LABELS_AS_TAGS='com.docker.label.foo, com.docker.label.bar' or -e DD_COLLECT_LABELS_AS_TAGS=com.docker.label.foo,com.docker.label.bar.

  • MAX_TRACES_PER_SECOND: Specifies the maximum number of traces per second to sample for APM. Set to 0 to disable this limit.

  • DD_HISTOGRAM_PERCENTILES: histogram percentiles to compute, separated by commas. The default is "0.95"

  • DD_HISTOGRAM_AGGREGATES: histogram aggregates to compute, separated by commas. The default is "max, median, avg, count"

Note: Some of those have alternative names, but with the same impact: it is possible to use DD_TAGS instead of TAGS, DD_LOG_LEVEL instead of LOG_LEVEL and DD_API_KEY instead of API_KEY.

Enabling integrations

Environment variables

It is possible to enable some checks through the environment:

  • KUBERNETES enables the kubernetes check if set (KUBERNETES=yes works)
  • to collect the kubernetes events, you can set KUBERNETES_COLLECT_EVENTS to true on one agent per cluster. Alternatively, you can enable the leader election mechanism by setting KUBERNETES_LEADER_CANDIDATE to true on candidate agents, and adjust the lease time (in seconds) with the KUBERNETES_LEADER_LEASE_DURATION variable.
  • by default, only events from the default namespace are collected. To change what namespaces are used, set the KUBERNETES_NAMESPACE_NAME_REGEX regexp to a valid regexp matching your relevant namespaces.
  • to collect the kube_service tags, the agent needs to query the apiserver's events and services endpoints. If you need to disable that, you can pass KUBERNETES_COLLECT_SERVICE_TAGS=false.
  • the kubelet API endpoint is assumed to be the default route of the container, you can override the kubelet API endpoint by specifying KUBERNETES_KUBELET_HOST (eg. when using CNI networking, the kubelet API may not listen on the default route address)
  • MESOS_MASTER and MESOS_SLAVE respectively enable the mesos master and mesos slave checks if set (MESOS_MASTER=yes works).
  • MARATHON_URL if set will be used to enable the Marathon check that will query the URL passed in this variable for metrics. It can usually be set to http://leader.mesos:8080.

Autodiscovery

Another way to enable checks is through Autodiscovery. This is particularly useful in dynamic environments like Kubernetes, Amazon ECS, or Docker Swarm. Read more about Autodiscovery on the Datadog Docs site.

Configuration files

You can also mount YAML configuration files in the /conf.d folder, they will automatically be copied to /etc/dd-agent/conf.d/ when the container starts. The same can be done for the /checks.d folder. Any Python files in the /checks.d folder will automatically be copied to /etc/dd-agent/checks.d/ when the container starts.

  1. Create a configuration folder on the host and write your YAML files in it. The examples below can be used for the /checks.d folder as well.

    mkdir /opt/dd-agent-conf.d
    touch /opt/dd-agent-conf.d/nginx.yaml
    
  2. When creating the container, mount this new folder to /conf.d.

    docker run -d --name dd-agent \
      -v /var/run/docker.sock:/var/run/docker.sock:ro \
      -v /proc/:/host/proc/:ro \
      -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
      -v /opt/dd-agent-conf.d:/conf.d:ro \
      -e API_KEY={your_api_key_here} \
      datadog/docker-dd-agent
    

    The important part here is -v /opt/dd-agent-conf.d:/conf.d:ro

Now when the container starts, all files in /opt/dd-agent-conf.d with a .yaml extension will be copied to /etc/dd-agent/conf.d/. Please note that to add new files you will need to restart the container.

JMX Images

If you need to run any JMX-based Agent checks, run a JMX image, e.g. datadog/docker-dd-agent:latest-jmx, datadog/docker-dd-agent:11.0.5150-jmx, etc. These images are based on the default images but add a JVM, which is needed for the Agent to run jmxfetch.

DogStatsD

Standalone DogStatsD

The default images (e.g. latest) run a DogStatsD server as well as the main Agent (i.e. the collector). If you want to run DogStatsD only, run a DogStatsD-only image, e.g. datadog/docker-dd-agent:latest-dogstatsd, datadog/docker-dd-agent:11.0.5141-dogstatsd-alpine, etc. These images don't run the collector process.

They also run the DogStatsD server as a non-root user, which is useful for platforms like OpenShift. They also don't need shared volumes from the host (/proc, /sys/fs and the Docker socket) like the default Agent image.

Note: Metrics submitted by this container will NOT get tagged with any global tags specified in datadog.conf. These tags are only read by the Agent's collector process, which these DogStatsD-only images do not run.

Note: Optionally, these images can run the the trace-agent process. Pass -e DD_APM_ENABLED=true to your docker run command to activate the trace-agent and allow your container to receive traces from Datadog's APM client libraries.

DogStatsD from the host

DogStatsD can be available on port 8125 from anywhere by adding the option -p 8125:8125/udp to the docker run command.

To make it available from your host only, use -p 127.0.0.1:8125:8125/udp instead.

Disable dogstatsd

DogStatsd can be disabled by setting USE_DOGSTATSD to no

DogStatsD from other containers

Using Docker host IP

Since the Agent container port 8125 should be linked to the host directly, you can connect to DogStatsD through the host. Usually the IP address of the host in a Docker container can be determined by looking at the address of the default route of this container with ip route for example. You can then configure your DogStatsD client to connect to 172.17.42.1:8125 for example.

Using Docker links (Legacy)

To send data to DogStatsD from other containers, add a --link dogstatsd:dogstatsd option to your run command.

For example, run a container my_container with the image my_image.

docker run  --name my_container           \
            --all_your_flags              \
            --link dogstatsd:dogstatsd    \
            my_image

DogStatsD address and port will be available in my_container's environment variables DOGSTATSD_PORT_8125_UDP_ADDR and DOGSTATSD_PORT_8125_UDP_PORT.

Tracing + APM

Enable the datadog-trace-agent in the docker-dd-agent container by passing DD_APM_ENABLED=true as an environment variable

Note: APM is NOT available on Alpine Images

Tracing from the host

Tracing can be available on port 8126/tcp from anywhere by adding the options -p 8126:8126/tcp to the docker run command

To make it available from your host only, use -p 127.0.0.1:8126:8126/tcp instead.

For example, the following command will allow the agent to receive traces from anywhere

docker run -d --name dd-agent \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
  -e API_KEY={your_api_key_here} \
  -e DD_APM_ENABLED=true \
  -p 8126:8126/tcp \
  datadog/docker-dd-agent

Previous instructions required binding to port 7777. This is a legacy port used by former client libraries and has been replaced by 8126.

Tracing from other containers

As with DogStatsD, traces can be submitted to the agent from other containers either using the Docker host IP or with Docker links

Using Docker links

docker run  --name my_container           \
            --all_your_flags              \
            --link dd-agent:dd-agent    \
            my_image

will expose DD_AGENT_PORT_8126_TCP_ADDR and DD_AGENT_PORT_8126_TCP_PORT as environment variables. Your application tracer can be configured to submit to this address.

An example in Python:

import os
from ddtrace import tracer
tracer.configure(
    hostname=os.environ["DD_AGENT_PORT_8126_TCP_ADDR"],
    port=os.environ["DD_AGENT_PORT_8126_TCP_PORT"]
)

Using Docker host IP

Agent container port 8126 should be linked to the host directly, Having determined the address of the default route of this container, with ip route for example, you can configure your application tracer to report to it.

An example in python, assuming 172.17.0.1 is the default route:

from ddtrace import tracer; tracer.configure(hostname="172.17.0.1", port=8126)

Build an image

To configure specific settings of the agent directly in the image, you may need to build a Docker image on top of ours.

  1. Create a Dockerfile to set your specific configuration or to install dependencies.

    FROM datadog/docker-dd-agent
    # Example: MySQL
    ADD conf.d/mysql.yaml /etc/dd-agent/conf.d/mysql.yaml
    
  2. Build it.

    docker build -t dd-agent-image .

  3. Then run it like the datadog/docker-dd-agent image.

    docker run -d --name dd-agent \
      -v /var/run/docker.sock:/var/run/docker.sock:ro \
      -v /proc/:/host/proc/:ro \
      -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
      -e API_KEY={your_api_key_here} \
      dd-agent-image
    
  4. It's done!

You can find some examples in our Github repository.

Alpine-based image

Starting from Agent 5.7 we also provide an image based on Alpine Linux. This image is smaller (about 60% the size of the Debian based one), and benefits from Alpine's security-oriented design. It is compatible with all options described in this file (Autodiscovery, enabling specific integrations, etc.) with the exception of JMX and Tracing (the trace-agent does not ship with the Alpine images).

This image is available under tags with the following naming convention usual_tag_name-alpine. So for example to use the latest tag: datadog/docker-dd-agent:latest-alpine must be pulled. To use a specific version number, specify 11.2.583-alpine.

The Alpine version can be used this way:

```
docker run -d --name dd-agent \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
  -e API_KEY={your_api_key_here} \
  datadog/docker-dd-agent:latest-alpine
```

Note: In this version, check configuration files must be stored in /opt/datadog-agent/agent/conf.d/ instead of /etc/dd-agent/conf.d/.

Warning: This version is recent, and its behaviour may differ a little (namely, it is running a source-installed agent so commands need to be adapted). If you find a bug, don't hesitate to file an issue, feedback around it is appreciated.

Versioning pattern

The docker image is following a versioning pattern that allows us to release changes to the Docker image of the Datadog Agent but with the same version of the Agent.

The Docker image version follows the following pattern:

X.Y.Z where X is the major version of the Docker Image, Y is the minor version, Z will represent the Agent version.

e.g. the first version of the Docker image that bundled the Datadog Agent 5.5.0 was:

10.0.550

Information

To display information about the Agent's state with this command.

debian:

docker exec dd-agent service datadog-agent info

alpine:

docker exec dd-agent /opt/datadog-agent/bin/agent info

Warning: the docker exec command is available only with Docker 1.3 and above.

Logs

Copy logs from the container to the host

That's the simplest solution. It imports container's log to one's host directory.

docker cp dd-agent:/var/log/datadog /tmp/log-datadog-agent

Supervisor logs

Basic information about the Agent execution are available through the logs command.

docker logs dd-agent

Exec a shell on the container and tail logs (collector.log, forwarder.log and jmxfetch.log) for debugging. The supervisor.log is available there as well but you can get that from docker logs dd-agent from the host.

alpine:

$ docker exec -it dd-agent ash
/opt/datadog-agent # tail -f /opt/datadog-agent/logs/dogstatsd.log
2016-07-22 23:09:09 | INFO | dd.dogstatsd | dogstatsd(dogstatsd.py:210) | Flush #8: flushed 1 metric, 0 events, and 0 service check runs

debian:

$ docker exec -it dd-agent bash
# tail -f /var/log/datadog/dogstatsd.log
2016-07-22 23:09:09 | INFO | dd.dogstatsd | dogstatsd(dogstatsd.py:210) | Flush #8: flushed 1 metric, 0 events, and 0 service check runs

Limitations

The Agent won't be able to collect disk metrics from volumes that are not mounted to the Agent container. If you want to monitor additional partitions, make sure to share them to the container in your docker run command (e.g. -v /data:/data:ro)

Docker isolates containers from the host. As a result, the Agent won't have access to all host metrics.

Known missing/incorrect metrics:

  • Network
  • Process list

Also, several integrations might be incomplete. See the "Contribute" section.

Contribute

If you notice a limitation or a bug with this container, feel free to open a Github issue. If it concerns the Agent itself, please refer to its documentation or its wiki.

docker-dd-agent's People

Contributors

aantono avatar aerostitch avatar alq666 avatar antoinepouille avatar bouk avatar clamoriniere avatar fotinakis avatar gmmeyer avatar hkaj avatar hush-hush avatar irabinovitch avatar kmshultz avatar kurtfm avatar leocavaille avatar leventyalcin avatar lotharsee avatar masci avatar mide avatar mikekap avatar mnussbaum avatar mortenlj avatar olivielpeau avatar remh avatar remicalixte avatar rindek avatar stonith avatar talwai avatar truthbk avatar xvello avatar yannmh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docker-dd-agent's Issues

Provide an image that doesn't require that it be run as root.

Your current image requires that it run as root. Because of the potential security issues, it isn't really best practice if there is no need for it to actually run as root.

That your image must run as root means that it is not usable on Docker based cloud infrastructure which restricts users to only running images as a non root user. This is going to make it harder for people to use your product.

That said, if a platform doesn't allow you to run as root, it isn't usually going to allow you to mount the Docker socket either.

The problem at this point is that you are bundling dogstatsd together with the agent for Docker monitoring (having deprecated the old image). The dogstastd should not need to access the Docker socket and so could be used by itself and quite happily run as a non root user. This means it is only of use for collecting application metrics, but when deploying to cloud infrastructure where you don't manage the underlying Docker, this may be all that you want anyway.

So what really should be done is the following:

  1. Provide an image where both dogstatsd and the agent can be run as a non root user.
  2. If the agent cannot be run as a non root user, then resume providing a separate up to date image for dogstatsd, but ensure that that separate image can be run as a non root user.

As to making dogstatsd runnable as a non root user, it is a simple matter of making certain directories and files group writable. This is sufficient because although platforms will override what user an image runs as, they usually leave the group alone, meaning they run as gid of 0. As the files and directories Docker creates when setting up an image are in the 0 group (root), group write access is adequate.

The files/directories that need group write access are:

RUN chmod g+w /etc/dd-agent/datadog.conf
RUN chmod g+w /var/log/datadog
RUN chmod g+w /etc/dd-agent

The config file being writable allows /entrypoint.sh to edit it when run as non root.

The other directories are for logging and socket/lock (???) files.

In addition to the above, there is also one problem deriving from the dogstatsd code that would need to be addressed.

When running just dogstatsd, you do not need the Docker socket for data collection so they do not need to be mounted. The problem is that dogstatsd code when it thinks it is running inside of Docker will try and use the Docker socket to work out what the hostname should be rather than use the internal host name.

The dogstastd code should have a graceful fallback to using the standard UNIX hostname in the container if the Docker socket is not present. Alternatively, this image should accept an environment variable to force dogstatsd to use the system hostname. For example, an environment variable called DOGSTATSD_USE_SYS_HOSTNAME. If this is set, the /entrypoint.sh script could run:

sed -i -e "s/^#hostname:.*$/hostname: `hostname`/" /etc/dd-agent/datadog.conf

That is, set the hostname in the configuration file so dogstastd uses that. That way don't have to modify the dogstatsd code.

As to running the agent as a non root user I am not sure what else is required for that. Even with the changes above, trying to running supervisord as non root user just sees it exit straight away and can find nothing logged anywhere indicating why it was failing to start up.

FWIW, you can find my own Docker image, basing off this one, which makes the changes to at least allow dogstatsd to run as a non root user at:

I see this as an interim measure and don't want to have to use it in the future. I would prefer to see DataDog provide an image that can run as non root user.

Also be aware that I intend to write a blog post about using dogstatsd as a sidecar container to an application under OpenShift 3. I will be explaining the current problems as outlined above, in that post. I hope that I will be able to say in that post that DataDog is working on coming out with an updated image that can run as a non root user. If you can it will mean that the DataDog product will be immediately useable to OpenShift 3 users for application monitoring.

Add network stats

Docker would be a great solution for me if all of the metrics worked correctly. It was mentioned in the docs that network, process list, and cpu metrics might be incorrect. Can these limitations be fixed?

exclude a container check

I have a container controlled by a timer unit. It runs every 30 seconds and I want to exclude it from the datadog agent check. The image is tagged as 'example.com/itlab/s3sync:latest'. I tried the following config:

conf.d/docker.yaml
....

include all, except s3sync

  include: []
  exclude:
     - "image:example.com/itlab/s3sync:latest"
     - "example.com/itlab/s3sync:llatest"
     - ".*s3sync*"

...

The exclude does not seem to work. I still see the events for this container that comes and go.
What's the correct format to exclude a container check?

Command not found: dogstatsd

I tried to launch just dogstatsd, but the command isn't in the path. I inspected the container using bash, and I still couldn't find the command anywhere.

Intermittent failure with "'NoneType' object has no attribute '__getitem__'"

Hey!

The agent seemed fine for a while, but I just noticed some weird patterns in our metrics, checked out the logs and it seems like the collector is failing with the errors below. Any suggestions? Seems like something upset it all of a sudden.

Client version: 1.6.0
Client API version: 1.18
Go version (client): go1.3.3
Git commit (client): 4749651/1.6.0
OS/Arch (client): linux/amd64
Server version: 1.6.0
Server API version: 1.18
Go version (server): go1.3.3
Git commit (server): 4749651/1.6.0
OS/Arch (server): linux/amd64
2015-05-29 14:08:12 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35090. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:11:23 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35100. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:14:33 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35110. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:17:44 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35120. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:20:54 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35130. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:24:06 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35140. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:27:16 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35150. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:30:27 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35160. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:33:37 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35170. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:36:48 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35180. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:39:59 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35190. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:43:10 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35200. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:46:20 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35210. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:49:31 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35220. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:52:41 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35230. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:55:53 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35240. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 14:59:03 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35250. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:02:14 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35260. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:05:24 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35270. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:08:35 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35280. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:11:46 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35290. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:14:56 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35300. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:18:07 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35310. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:21:17 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35320. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:24:29 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35330. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:27:39 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35340. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:30:50 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35350. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:34:00 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35360. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:37:11 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35370. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:40:22 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35380. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:43:33 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35390. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:46:43 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35400. Collection time: 4.03s. Emit time: 0.0s
2015-05-29 15:49:54 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35410. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:53:04 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35420. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:56:16 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35430. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 15:59:26 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35440. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:02:37 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35450. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:05:47 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35460. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:08:58 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35470. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:12:09 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35480. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:15:20 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35490. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:18:30 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35500. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:21:41 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35510. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:24:56 UTC | ERROR | dd.collector | checks.docker(__init__.py:556) | Check 'docker' instance #0 failed
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/__init__.py", line 547, in run
    self.check(copy.deepcopy(instance))
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 135, in check
    containers, ids_to_names = self._get_and_count_containers(instance)
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 170, in _get_and_count_containers
    raise Exception("Failed to collect the list of containers. Exception: {0}".format(e))
Exception: Failed to collect the list of containers. Exception: timed out
2015-05-29 16:24:56 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35520. Collection time: 9.03s. Emit time: 0.01s
2015-05-29 16:25:21 UTC | ERROR | dd.collector | checks.docker(__init__.py:556) | Check 'docker' instance #0 failed
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/__init__.py", line 547, in run
    self.check(copy.deepcopy(instance))
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 135, in check
    containers, ids_to_names = self._get_and_count_containers(instance)
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 170, in _get_and_count_containers
    raise Exception("Failed to collect the list of containers. Exception: {0}".format(e))
Exception: Failed to collect the list of containers. Exception: timed out
2015-05-29 16:28:13 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35530. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:31:23 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35540. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:34:34 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35550. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:37:44 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35560. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:40:56 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35570. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:44:06 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35580. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:47:17 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35590. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:50:27 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35600. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:53:38 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35610. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:56:49 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35620. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 16:59:27 UTC | ERROR | dd.collector | checks.docker(__init__.py:556) | Check 'docker' instance #0 failed
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/__init__.py", line 547, in run
    self.check(copy.deepcopy(instance))
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 135, in check
    containers, ids_to_names = self._get_and_count_containers(instance)
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 170, in _get_and_count_containers
    raise Exception("Failed to collect the list of containers. Exception: {0}".format(e))
Exception: Failed to collect the list of containers. Exception: timed out
2015-05-29 17:00:05 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35630. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 17:01:45 UTC | ERROR | dd.collector | checks.docker(__init__.py:556) | Check 'docker' instance #0 failed
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/__init__.py", line 547, in run
    self.check(copy.deepcopy(instance))
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 135, in check
    containers, ids_to_names = self._get_and_count_containers(instance)
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 170, in _get_and_count_containers
    raise Exception("Failed to collect the list of containers. Exception: {0}".format(e))
Exception: Failed to collect the list of containers. Exception: timed out
2015-05-29 17:02:04 UTC | INFO | dd.collector | aggregator(aggregator.py:358) | Metric docker.cpu.user has a rate < 0. Counter may have been Reset.
2015-05-29 17:03:20 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35640. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 17:06:31 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35650. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 17:09:41 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35660. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 17:12:53 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35670. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 17:16:03 UTC | INFO | dd.collector | checks.collector(collector.py:383) | Finished run #35680. Collection time: 4.03s. Emit time: 0.01s
2015-05-29 17:18:36 UTC | ERROR | dd.collector | checks.docker(__init__.py:556) | Check 'docker' instance #0 failed
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/__init__.py", line 547, in run
    self.check(copy.deepcopy(instance))
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 135, in check
    containers, ids_to_names = self._get_and_count_containers(instance)
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 189, in _get_and_count_containers
    ids_to_names[container['Id']] = container['Names'][0].lstrip("/")
TypeError: 'NoneType' object has no attribute '__getitem__'
2015-05-29 17:18:55 UTC | ERROR | dd.collector | checks.docker(__init__.py:556) | Check 'docker' instance #0 failed
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/__init__.py", line 547, in run
    self.check(copy.deepcopy(instance))
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 135, in check
    containers, ids_to_names = self._get_and_count_containers(instance)
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 189, in _get_and_count_containers
    ids_to_names[container['Id']] = container['Names'][0].lstrip("/")
TypeError: 'NoneType' object has no attribute '__getitem__'
2015-05-29 17:19:14 UTC | ERROR | dd.collector | checks.docker(__init__.py:556) | Check 'docker' instance #0 failed
Traceback (most recent call last):

More descriptive Docker Cloud hostnames

So on Docker Cloud, the agent gets a hostname based on /etc/hostname, which is pretty good, but a bit opaque for the administrator, who is used to seeing the node cluster name. I wonder if it's possible to prepend the node cluster if it's detected? Maybe try to find it through the Docker Cloud API? Not sure if you'd consider that an edge case or something that should be done in a downstream container, though.

Tomcat/JVM metrics are not shipped

I'd like to remotely get tomcat logs but I get this error message

2015-04-16 14:19:38,473 | INFO | Instance | Trying to connect to JMX Server at 10.0.1.107:9005
2015-04-16 14:19:38,473 | INFO | ConnectionManager | Connection closed or does not exist. Creating a new connection!
2015-04-16 14:19:38,473 | INFO | ConnectionManager | Connecting using JMX Remote
2015-04-16 14:19:38,473 | INFO | Connection | Connecting to: service:jmx:rmi:///jndi/rmi://10.0.1.107:9005/jmxrmi
2015-04-16 14:19:41,486 | ERROR| App | Cannot connect to instance 10.0.1.107:9005 java.rmi.ConnectIOException: Exception creating connection to: 172.17.0.35; nested exception is:
    java.net.NoRouteToHostException: No route to host

You can find my demo-datadog-image build files here:
https://github.com/Pindar/coreos-demo/tree/feat/registrator_skydns_datadog/datadog

I used docker exec to debug this issue and successfully pinged 10.0.1.107. Also I was able to connect to the port 9005 (netcat).

What do I need to configure to get this up and running?

systemd unit file for running dd-agent in CoreOS

Just in case someone needs it, I share a simple systemd service file for running dd-agent in CoreOS:

# vim: ft=systemd sw=2 et :

[Unit]
Description=Datadog Agent
After=docker.service
Requires=docker.service

[Service]
Restart=always
RestartSec=15

EnvironmentFile=/etc/environment
Environment=API_KEY=<YOUR_API_KEY>

ExecStartPre=-/usr/bin/docker stop dd-agent
ExecStartPre=-/usr/bin/docker rm -f dd-agent
ExecStartPre=/usr/bin/docker pull datadog/docker-dd-agent

ExecStart=/bin/bash -c ' \
  set -ex; \
  docker run --name dd-agent \
    --privileged \
    -h $HOSTNAME \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /proc/mounts:/host/proc/mounts:ro \
    -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
    -e API_KEY=$API_KEY \
    datadog/docker-dd-agent; \
'

ExecStop=-/usr/bin/docker stop dd-agent
ExecStop=-/usr/bin/docker rm -f dd-agent

[X-Fleet]
Global=true

Consider reducing image size

Right now the container uses Debian Jessie. This means there's stuff like Perl, locales and a lot more (systemd, etc.). Then, because of /opt/datadog-agent/embedded, there are lots of duplicate binaries: which, top, touch (probably not needed by the agent), but also update-alternatives, dpkg-divert, ncurses5-config, captoinfo and many more (very likely to be unnecessary) :-)

Possible solutions:

  • start from a smaller base image, such as Busybox or Alpine
  • use a two-step approach, where the first sets up the build and just outputs what's needed at runtime, to be packaged in the second step as the final image.

Failing in kubernetes 1.2.3

We recently upgraded from kubernetes 1.1.8 to 1.2.3

Since then we've seen all metrics collection stop, the logs show lots of this:

2016-05-03 00:34:07,804 INFO success: dogstatsd entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2016-05-03 00:34:07,805 INFO spawned: 'collector' with pid 2062
2016-05-03 00:34:07,883 INFO exited: dogstatsd (exit status 1; not expected)

To install the daemonset, I had to slightly alter the configuration give here https://app.datadoghq.com/account/settings#agent/kubernetes by adding spec.selector.matchLabels.app: dd-agent

Are there other changes that would have to be made to have it work in kubernetes 1.2.3?

java / Kafka example is incorrect

Hello,

I tried the example here: https://github.com/DataDog/docker-dd-agent/blob/master/examples/kafka/Dockerfile

and it does not work, first the '-qq' option masks the main issue by silencing the output, second we need the option '-y' for apt-get install otherwise the build process dies like this:
Do you want to continue? [Y/n] Abort.
2015/11/10 08:29:42 Failed to process json stream error: The command '/bin/sh -c apt-get update && apt-get install openjdk-7-jre-headless --no-install-recommends && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*' returned a non-zero code: 1

on a side not i think you should provide a docker image with java support (and one without if you are concerned about the size of the final image).

thanx

Does not function with Kitematic app

After adding my API_KEY to env variables I get this:

2015-04-09 17:34:31,723 CRIT Supervisor running as root (no user in config file)
2015-04-09 17:34:31,732 INFO RPC interface 'supervisor' initialized
2015-04-09 17:34:31,732 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2015-04-09 17:34:31,733 INFO supervisord started with pid 1
2015-04-09 17:34:32,741 INFO spawned: 'dogstatsd' with pid 9
2015-04-09 17:34:32,744 INFO spawned: 'forwarder' with pid 10
2015-04-09 17:34:32,746 INFO spawned: 'collector' with pid 11
2015-04-09 17:34:38,164 CRIT Supervisor running as root (no user in config file)
Unlinking stale socket /var/tmp/datadog-supervisor.sock
2015-04-09 17:34:38,476 INFO RPC interface 'supervisor' initialized
2015-04-09 17:34:38,476 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2015-04-09 17:34:38,476 INFO supervisord started with pid 1
2015-04-09 17:34:39,488 INFO spawned: 'dogstatsd' with pid 8
2015-04-09 17:34:39,489 INFO spawned: 'forwarder' with pid 9
2015-04-09 17:34:39,492 INFO spawned: 'collector' with pid 10
2015-04-09 17:34:39,735 INFO exited: dogstatsd (exit status 1; not expected)
2015-04-09 17:34:39,767 INFO exited: collector (exit status 1; not expected)
2015-04-09 17:34:40,778 INFO spawned: 'dogstatsd' with pid 19
2015-04-09 17:34:40,780 INFO spawned: 'collector' with pid 20
2015-04-09 17:34:40,973 INFO exited: dogstatsd (exit status 1; not expected)
2015-04-09 17:34:40,991 INFO exited: collector (exit status 1; not expected)
2015-04-09 17:34:43,007 INFO spawned: 'dogstatsd' with pid 27
2015-04-09 17:34:43,009 INFO spawned: 'collector' with pid 28
2015-04-09 17:34:43,188 INFO exited: dogstatsd (exit status 1; not expected)
2015-04-09 17:34:43,202 INFO exited: collector (exit status 1; not expected)
2015-04-09 17:34:44,728 INFO success: forwarder entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2015-04-09 17:34:46,744 INFO spawned: 'dogstatsd' with pid 35
2015-04-09 17:34:46,745 INFO spawned: 'collector' with pid 36
2015-04-09 17:34:46,921 INFO exited: dogstatsd (exit status 1; not expected)
2015-04-09 17:34:46,929 INFO gave up: dogstatsd entered FATAL state, too many start retries too quickly
2015-04-09 17:34:46,944 INFO exited: collector (exit status 1; not expected)
2015-04-09 17:34:47,946 INFO gave up: collector entered FATAL state, too many start retries too quickly

Ref: https://kitematic.com/

dd-agent crashes docker on boot

I'm running dd-agent with --restart=always so that docker will restart the container on boot.

When I do this docker crashes and will not run until I delete the dd-agent container from /var/lib/docker/containers.

Instance check skipped

What could cause this error in collector.log:

2015-06-08 19:17:57 UTC | ERROR | dd.collector | checks.http_check(network_checks.py:125) | Instance: CI  skipped because it's already running
2015-06-08 19:18:24 UTC | CRITICAL | dd.collector | checks.http_check(network_checks.py:214) | Restarting Pool. One check is stuck:  CI

Here is the relevant info:

Image: datadog/docker-dd-agent:latest

root@anchorage-admiral:/# dd-agent info
===================
Collector (v 5.3.0)
===================

  Status date: 2015-06-08 19:11:52 (19s ago)
  Pid: 16
  Platform: Linux-4.0.1-x86_64-with-debian-7.8
  Python Version: 2.7.9
  Logs: <stderr>, /var/log/datadog/collector.log

....
http_check
    ----------
      - instance #0 [WARNING]
          Warning: Using events for service checks is deprecated in favor of monitors and will be removed in future versions of the Datadog Agent.
          Warning: Using events for service checks is deprecated in favor of monitors and will be removed in future versions of the Datadog Agent.
          Warning: Using events for service checks is deprecated in favor of monitors and will be removed in future versions of the Datadog Agent.
          Warning: Using events for service checks is deprecated in favor of monitors and will be removed in future versions of the Datadog Agent.
          Warning: Using events for service checks is deprecated in favor of monitors and will be removed in future versions of the Datadog Agent.
      - instance #1 [OK]
      - instance #2 [OK]
      - instance #3 [OK]
      - instance #4 [OK]
      - instance #5 [OK]
      - instance #6 [OK]

The checks that causing ERROR are https checks in conf.d/tcp_check.yaml, in fact, all https checks configured this way are skipped:

-   name: CI
        url: https://ci.example.com
        timeout: 10
        window: 5
        threshold:  3
        tags:
            - url:https://ci.example.com
            - env:anchorage
        content_match: Authentication required
        http_response_status_code: 403
        include_content: true

I am trying to do a check based on http response code and content match here.
What I am doing wrong? I looked the network_check source code but could not figure out why the https check was skipped.

Docker container doesn't work - entrypoint.sh permissions error

Hi guys,

We've hit an issue today because we were using the 'ecs' tag of the docker container on DockerHub which is no longer working. The commit #83 I believe introduced an error, which means the container won't start. Our AWS ECS infrastructure stopped allowing us to deploy and we tracked it do it to this:

2016-06-27T15:07:39Z [WARN] Error with docker; stopping container module="TaskEngine" task="django-app:55 arn:aws:ecs:eu-west-1:931510279835:task/2a882630-f738-4704-b5c1-7e2c306de988, Status: (NONE->RUNNING) Containers: [dd-agent (RUNNING->RUNNING),app (NONE->RUNNING),consul (RUNNING->RUNNING),logspout (RUNNING->RUNNING),nginx (NONE->RUNNING),registrator (NONE->RUNNING),]" container="dd-agent(datadog/docker-dd-agent:ecs) (RUNNING->RUNNING)" err="API error (500): Cannot start container 0e7230dc3eef9e344270005404c6e82b50c1fd0d2c4d79535ec1e104fcdcf81d: [8] System error: exec: "/entrypoint.sh": permission denied

For now I will try and use an older tag.

Akram

Copy /checks.d too

It would be nice to copy /checks.d files into /etc/dd-agent/checks.d just like the /conf.d action.

In the case of AWS EC2 Amazon Linux /sys/fs/cgroup should be /cgroup

Tried to run this on an EC2 Amazon Linux instance:

docker run -d --privileged --name dd-agent -h hostname -v /var/run/docker.sock:/var/run/docker.sock -v /proc/mounts:/host/proc/mounts:ro -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro -e API_KEY=apikey_3 datadog/docker-dd-agent

didn't work, but this did: (replaced /sys/fs/cgroup with /cgroup

docker run -d --privileged --name dd-agent -h hostname -v /var/run/docker.sock:/var/run/docker.sock -v /proc/mounts:/host/proc/mounts:ro -v /cgroup/:/host/sys/fs/cgroup:ro -e API_KEY=apikey_3 datadog/docker-dd-agent

swarm support?

is it possible to use swarm host rather than docker socket?

Have dockerfile create VOLUME checks.d

15b52e0 breaks the image--it won't run because entrypoint.sh expects to find /checks.d/ and it's not there.

The fix is probably just to add this to the Dockerfile.

VOLUME ["/conf.d"]

While doing that, it would be good to check if the CMD mkdir -p /conf.d is necessary--I believe it's not.

dd-agent info broken as of 5.5 (latest)

core@staging-1 ~ $ nse datadog-agent
root@staging-1:/# dd-agent info
Traceback (most recent call last):
  File "/usr/bin/dd-agent", line 13, in <module>
    from config import get_version, initialize_logging # noqa
ImportError: No module named config

With docker image version 5.4.6 everything is ok.

Can't collect stats from docker through tcp?

Is it possible for this container to collect stats from multiple docker hosts?

I'm trying this docker.yaml:

init_config:

instances:
  - url: "tcp://develop.docker1.elevate.internal:2375"
    new_tag_names: true
  - url: "tcp://develop.docker2.elevate.internal:2375"
    new_tag_names: true
  - url: "tcp://develop.docker3.elevate.internal:2375"
    new_tag_names: true

But this is what I get in /var/log/datadog/collector.log :

2015-11-05 17:27:58 UTC | ERROR | dd.collector | checks.docker(__init__.py:689) | Check 'docker' instance #2 failed
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/__init__.py", line 672, in run
    self.check(copy.deepcopy(instance))
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 140, in check
    containers, ids_to_names = self._get_and_count_containers(instance)
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 175, in _get_and_count_containers
    raise Exception("Failed to collect the list of containers. Exception: {0}".format(e))
Exception: Failed to collect the list of containers. Exception: <urlopen error unknown url type: tcp>

I'm probably configuring something incorrectly, but there isn't a lot of documentation in https://app.datadoghq.com/account/settings#integrations/docker .

Cheers,
Mauricio

Don't report Kubernetes pause containers

Right now there are tens of gcr.io/google_containers/pause containers getting reported (one per running pod in the cluster). These only exist as an artifact of setting up networking and IPC in the pods. They're just adding noise and should be ignored by default (do they count for billing purposes?).

A few approaches:

  1. Ignore them in the default Docker config
  2. Ignore them in the code
  3. Allow setting an environment variable to ignore them, without having to rebuild the image.

Option 3 might be the best because

  • the image can be changed through a kubelet flag, e.g. for people using local/private registries
  • people might have other containers they want to ignore for whatever reason (I have a few more, actually)

Diagnosing issue sending metrics on Kubernetes

I recently followed the Kubernetes setup instructions and the logs seem to suggest that everything is dandy, but I'm not seeing kubernetes.* metrics in the Metric Explorer within Datadog. Here's the pod's log output:

$ kc logs -f dd-agent-bb4d3
2016-03-30 16:48:21,572 CRIT Supervisor running as root (no user in config file)
2016-03-30 16:48:21,627 INFO RPC interface 'supervisor' initialized
2016-03-30 16:48:21,627 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2016-03-30 16:48:21,628 INFO RPC interface 'supervisor' initialized
2016-03-30 16:48:21,628 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2016-03-30 16:48:21,628 INFO supervisord started with pid 1
2016-03-30 16:48:22,631 INFO spawned: 'dogstatsd' with pid 11
2016-03-30 16:48:22,632 INFO spawned: 'forwarder' with pid 12
2016-03-30 16:48:22,634 INFO spawned: 'collector' with pid 13
2016-03-30 16:48:22,635 INFO spawned: 'jmxfetch' with pid 14
2016-03-30 16:48:26,395 INFO success: jmxfetch entered RUNNING state, process has stayed up for > than 3 seconds (startsecs)
2016-03-30 16:48:27,187 INFO exited: jmxfetch (exit status 0; expected)
2016-03-30 16:48:27,676 INFO success: dogstatsd entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2016-03-30 16:48:27,676 INFO success: forwarder entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2016-03-30 16:48:27,676 INFO success: collector entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)

I've substituted my API key in and have verified that it is set within the pod. How can I tell where I'm going wrong?

Running this on Google Container Engine on a Kubernetes 1.2.0 cluster.

Monitor disk usage on separate volume

On some of my systems /var/lib/docker is a tmpfs ramdisk. I want to monitor free space on it. Is this possible?

I've mounted it into the dd-agent container: -v /var/lib/docker:/ramdisk:ro

However I'm not seeing any metric other than system.disk.free. Any tips?

report real hostname

probably this can't be determined from within the container, so it would have to be done with an env variable

Tags set via environment not showing up

I'm not seeing any of the tags I set via environment variables ending up in my datadog metrics.

    dogstatsd:
        image: datadog/docker-dd-agent:11.0.563
        command: dogstatsd
        environment:
          API_KEY: 'redacted'
          TAGS: 'environment:development'

I'm using docker-compose and my metrics show up in Datadog, but not this tag I'm setting.

Tag for agent 5.0?

Wondering if you could add an Automated Build for the agent-5.0 branch?

Proxy username populated into password field

Hi, there is a bug in entrypoint.sh

if [[ $PROXY_PASSWORD ]]; then
    sed -i -e "s/^# proxy_password:.*$/proxy_password: ${PROXY_USER}/" /etc/dd-agent/datadog.conf
fi

As can be seen PROXY_USER is used in the password field.

btrfs support for host volumes?

I'm trying to setup btrfs monitoring on my CoreOS host. But the data reported doesn't appear correct. Here is an output for /dev/xvdb:

core@ip-172-16-96-127 ~ $ sudo btrfs fi show /dev/xvdb
Label: none  uuid: 960644b6-8aae-415b-9e16-cd7ac05a5b99
    Total devices 1 FS bytes used 874.20MiB
    devid    1 size 64.00GiB used 3.04GiB path /dev/xvdb

Btrfs v3.14_pre20140414

Here are the associated graphs for used and total from the Datadog interface:

image

Used space is close, but still under what is reported. Total size is magnitudes under what s reported by btrfs output. Any ideas what might be going on?

Error: Cannot open an HTTP server: socket.error reported errno.EROFS (30)

Hello! After i was update dd-agent from 5.3.0 to 5.4.2, i can't start it.
Error message:

Error: Cannot open an HTTP server: socket.error reported errno.EROFS (30)
For help, use /opt/datadog-agent/bin/supervisord -h

My docker config:

/usr/bin/docker run --rm \
    --read-only \
    --privileged \
    --name dd-agent \
    -h `hostname` \
    -p 8125:8125/udp \
    -v /etc/dd-agent \
    -v /tmp \
    -v /var/tmp \
    -v /var/run \
    -v /var/log/datadog/:/var/log/datadog/ \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /etc/dd-agent/conf.d:/etc/dd-agent/conf.d \
    -v /proc/mounts:/host/proc/mounts:ro \
    -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
    -e API_KEY=MY_KEY \
    datadog/docker-dd-agent:5.4.2

Why it can't be started? In what directory it want write?

Consider conf.d as a volume and adding extras

To alleviate the need to build intermediate images, it might be nice to allow for /etc/dd-agent/conf.d as a persistent volume and then install some commonly required libraries (python-mysqldb, python-redis, python-rrdtool) by default.

You can then create a host directory and have YAML configurations located at ~/datadog and run the container using volumes:

$ docker run -d --privileged --name dd-agent -h $(hostname) -e API_KEY=513146123a50b321fc38548e317629ea -v /home/ubuntu/datadog:/etc/dd-agent/conf.d datadog/docker-dd-agent

cgroups are mounted after check initialization

The check can initialize before the cgroup mounts resulting in the following error:

<TIMESTAMP> | ERROR | dd.collector | config(config.py:897) | Unable to initialize check docker
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/config.py", line 889, in load_check_directory
    agentConfig=agentConfig, instances=instances)
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 125, in __init__
    self._mountpoints[metric["cgroup"]] = self._find_cgroup(metric["cgroup"], docker_root)
  File "/opt/datadog-agent/agent/checks.d/docker.py", line 427, in _find_cgroup
    raise Exception("Can't find mounted cgroups. If you run the Agent inside a container,"
Exception: Can't find mounted cgroups. If you run the Agent inside a container, please refer to the documentation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.