canonical / prometheus-openstack-exporter Goto Github PK

View Code? Open in Web Editor NEW

129.0 48.0 113.0 179 KB

OpenStack exporter for the prometheus monitoring system

License: GNU General Public License v3.0

Python 90.50% Shell 6.38% Makefile 1.68% Dockerfile 1.44%

prometheus-openstack-exporter's Introduction

Prometheus OpenStack exporter

Note

This charm is under maintenance mode. Only critical bugs will be handled.

Exposes high level OpenStack metrics to Prometheus.

Data can be visualised using Grafana and the OpenStack Clouds Dashboard

Deployment

Requirements

sudo apt-get install python-neutronclient python-novaclient python-keystoneclient python-netaddr python-cinderclient

Install prometheus_client. On Ubuntu 16.04:

apt-get install python-prometheus-client

On Ubuntu 14.04:

pip install prometheus_client

Installation

# Copy example config in place, edit to your needs
sudo cp prometheus-openstack-exporter.yaml /etc/prometheus/

## Upstart
# Install job
sudo cp prometheus-openstack-exporter.conf /etc/init

# Configure novarc location:
sudo sh -c 'echo "NOVARC=/path/to/admin-novarc">/etc/default/prometheus-openstack-exporter'

## Systemd
# Install job
sudo cp prometheus-openstack-exporter.service /etc/systemd/system/

# create novarc
sudo cat <<EOF > /etc/prometheus-openstack-exporter/admin.novarc
export OS_USERNAME=Admin
export OS_TENANT_NAME=admin
export OS_PASSWORD=XXXX
export OS_REGION_NAME=cloudname
export OS_AUTH_URL=http://XX.XX.XX.XX:35357/v2.0
EOF

# create default config location
sudo sh -c 'echo "CONFIG_FILE=/etc/prometheus-openstack-exporter/prometheus-openstack-exporter.yaml">/etc/default/prometheus-openstack-exporter'


# Start
sudo start prometheus-openstack-exporter

Or to run interactively:

. /path/to/admin-novarc
./prometheus-openstack-exporter prometheus-openstack-exporter.yaml

Or use Docker Image:

# docker-compose.yml
version: '2.1'
services:
  ostackexporter:
    image: moghaddas/prom-openstack-exporter:latest
    # check this examle env file
    env_file:
      - ./admin.novarc.example
    restart: unless-stopped
    expose:
      - 9183
    ports:
      - 9183:9183

# docker run
docker run \
  -itd \
  --name prom_openstack_exporter \
  -p 9183:9183 \
  --env-file=$(pwd)/admin.novarc.example \
  --restart=unless-stopped \
  moghaddas/prom-openstack-exporter:latest

Configuration

Configuration options are documented in prometheus-openstack-exporter.yaml shipped with this project

FAQ

Why are openstack_allocation_ratio values hardcoded?

There is no way to retrieve them using OpenStack API.

Alternative approach could be to hardcode those values in queries but this approach breaks when allocation ratios change.

Why hardcode swift host list?

Same as above, there is no way to retrieve swift hosts using API.

Why not write dedicated swift exporter?

Swift stats are included mainly because they are trivial to retrieve. If and when standalone swift exporter appears we can revisit this approach

Why cache data?

We are aware that Prometheus best practise is to avoid caching. Unfortunately queries we need to run are very heavy and in bigger clouds can take minutes to execute. This is problematic not only because of delays but also because multiple servers scraping the exporter could have negative impact on the cloud performance

How are Swift account metrics obtained?

Fairly simply! Given a copy of the Swift rings (in fact, we just need account.ring.gz) we can load this up and then ask it where particular accounts are located in the cluster. We assume that Swift is replicating properly, pick a node at random, and ask it for the account's statistics with an HTTP HEAD request, which it returns.

How hard would it be to export Swift usage by container?

Sending a GET request to the account URL yields a list of containers (probably paginated, so watch out for that!). In order to write a container-exporter, one could add some code to fetch a list of containers from the account server, load up the container ring, and then use container_ring.get_nodes(account, container) and HTTP HEAD on one of the resulting nodes to get a containers' statistics, although without some caching cleverness this will scale poorly.

Known Issues

EOFError by pickle.py

You should wait. It needs dump file to generate metrics

prometheus-openstack-exporter's People

Contributors

Stargazers

Watchers

Forkers

mihdih szaouam auria kfox1111 jjo mthaddon rfinnie jacekn ideaship dulek jmlowe c-mart k-tooriyama brad-marshall zlabjp luchetto81 huzhengchuan canltu slarimore02 sabaini xiaoruiguo no2a sajoupa alexander-pugachev-workday barksten acdc-cloud opensystemslab young8 vendrusculo man-group zhangjianweibj jwalzer czunker sjohnson-ticom crazikpl ilanddev gecio twstoll oliof dizhaung huzlak phonglh79 gilbus andpupilo0182 beand osuosl siavashsardari tonychengtw joshbulger jamiewri kguille antonionovaesjr afreiberger gabrielegiammatteo songshanshi hpn-bristol arvancloud wherego2000 hahaps polinchw eveningcafe cfarquhar gengafdafd hloeung jmacfar al-ti damianbulira marcosmamorim scality fabricat ramonlln matuskosut laashub-soa rudolfkastl rht-tasinha phvalguima shenkaibo liyoujunlizi rezabojnordi lgp2013 greenweb-cloud dhaker12 zeroberto86 charygao royaflash spfz dojun-park arif-ali stustanet peppepetra stjordanis haifgh lungdear ma-due drencrom valleedelisle rul gabrielcocenza samkenxstream asaadna

prometheus-openstack-exporter's Issues

Swift libs are missing

p-o-e imports swift libs but they're not part of the build

define multiple "schedulable_instance_size"

In our environment, there are many type of flavor and we'd like to watch the rest capacities for each flavor.
Is there any way to define multiple "schedulable_instance_size" or a plan to add this function?
If it's possible, this exporter can be used to monitor and plan OpenStack's capacity briefly and inclusively in production environment.

like this:

schedulable_instance_size:
    x1.small:
        ram_mbs: 4096
        vcpu: 2
        disk_gbs: 20
    x1.large:
        ram_mbs: 8192
        vcpu: 4
        disk_gbs: 200

Make github releases

Dear Maintainers,

I am planning to use this exporter as part of the kolla containers, but it requires to have a official release source, that isn't snap only.

Since you are tagging the releases, could you please make place on your release process
to also generate github releases?

Thanks in advance.

Cache file is not created

I followed the instructions on the README but
when starting the application I'm getting back this error:

[user@node1 ~]$ curl localhost:9183/metrics
Traceback (most recent call last):
  File "./prometheus-openstack-exporter", line 565, in do_GET
    neutron = Neutron()
  File "./prometheus-openstack-exporter", line 200, in __init__
    with open(config['cache_file'], 'rb') as f:
IOError: [Errno 2] No such file or directory: '/var/cache/prometheus-openstack-exporter/mycloud'

The directory exists and but the file is not created
Is this a known issue?

Recently installed on OSP10 - getting this error while accessing metrics on browser

would appreciate any help...

Traceback (most recent call last):
File "/opt/prometheus-openstack-exporter/prometheus-openstack-exporter", line 464, in do_GET
neutron = Neutron()
File "/opt/prometheus-openstack-exporter/prometheus-openstack-exporter", line 170, in init
self.prodstack = pickle.load(f)[0]
File "/usr/lib64/python2.7/pickle.py", line 1378, in load
return Unpickler(file).load()
File "/usr/lib64/python2.7/pickle.py", line 858, in load
dispatchkey
File "/usr/lib64/python2.7/pickle.py", line 880, in load_eof
raise EOFError
EOFError

Support for keystone v3 API needed.

I'm trying to connect with Ocata Keystone API, but I got:

missing OS_TENANT_NAME and OS_REGION_NAME in novarc (there is no longer such things for v3 API)
after adding such envs and pointing to v2.0 API:

# prometheus-openstack-exporter/prometheus-openstack-exporter prometheus-openstack-exporter.yaml Traceback (most recent call last): File "prometheus-openstack-exporter/prometheus-openstack-exporter", line 78, in run prodstack['tenants'] = [x._info for x in keystone.tenants.list()] File "/usr/lib/python2.7/site-packages/keystoneclient/v2_0/tenants.py", line 123, in list tenant_list = self._list('/tenants%s' % query, 'tenants') File "/usr/lib/python2.7/site-packages/keystoneclient/base.py", line 124, in _list resp, body = self.client.get(url, **kwargs) File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 187, in get return self.request(url, 'GET', **kwargs) File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 344, in request resp = super(LegacyJsonAdapter, self).request(*args, **kwargs) File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 112, in request return self.session.request(url, method, **kwargs) File "/usr/lib/python2.7/site-packages/positional/__init__.py", line 94, in inner return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/keystoneclient/session.py", line 420, in request raise exceptions.from_response(resp, method, url) NotFound: The resource could not be found. (HTTP 404)

I was trying to import modules from keystoneclient.v3, but there is a lot of code to rewrite.

Version 0.1.9 are forking infinitely

edge channel version revision 36(0.1.9) are forking infinitely. The process is forking until it reach the process limit and stop working with the following error:

snap.prometheus-openstack-exporter.prometheus-openstack-exporter.service - Service for snap application prometheus-openstack-exporter.prometheus-openstack-exporter
   Loaded: loaded (/etc/systemd/system/snap.prometheus-openstack-exporter.prometheus-openstack-exporter.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2023-03-13 10:57:00 UTC; 2 days ago                                                                         
 Main PID: 57286 (python3)                                                                                  
    Tasks: 12287 (limit: 12287)                                                                                                                                      
   CGroup: /system.slice/snap.prometheus-openstack-exporter.prometheus-openstack-exporter.service                                                                    
           └─57286 python3 /snap/prometheus-openstack-exporter/36/bin/prometheus-openstack-exporter /var/snap/prometheus-openstack-exporter/36/prometheus-openstack-e
                                                                       
Mar 15 13:13:52 juju-48267c-5-lxd-21 prometheus-openstack-exporter.prometheus-openstack-exporter[57286]: ----------------------------------------
Mar 15 13:13:52 juju-48267c-5-lxd-21 prometheus-openstack-exporter.prometheus-openstack-exporter[57286]: Exception happened during processing of request from ('10.13
Mar 15 13:13:52 juju-48267c-5-lxd-21 prometheus-openstack-exporter.prometheus-openstack-exporter[57286]: Traceback (most recent call last):                         
Mar 15 13:13:52 juju-48267c-5-lxd-21 prometheus-openstack-exporter.prometheus-openstack-exporter[57286]:   File "/usr/lib/python3.8/socketserver.py", line 316, in _h
Mar 15 13:13:52 juju-48267c-5-lxd-21 prometheus-openstack-exporter.prometheus-openstack-exporter[57286]:     self.process_request(request, client_address)
Mar 15 13:13:52 juju-48267c-5-lxd-21 prometheus-openstack-exporter.prometheus-openstack-exporter[57286]:   File "/usr/lib/python3.8/socketserver.py", line 603, in pr
Mar 15 13:13:52 juju-48267c-5-lxd-21 prometheus-openstack-exporter.prometheus-openstack-exporter[57286]:     pid = os.fork()
Mar 15 13:13:52 juju-48267c-5-lxd-21 prometheus-openstack-exporter.prometheus-openstack-exporter[57286]: BlockingIOError: [Errno 11] Resource temporarily unavailable
Mar 15 13:13:52 juju-48267c-5-lxd-21 prometheus-openstack-exporter.prometheus-openstack-exporter[57286]: ----------------------------------------
Mar 15 13:15:22 juju-48267c-5-lxd-21 python3[57286]: Error getting tenants.list, continue with projects.list

DOWN ports without fixed IP addresses on routers causes traceback at _get_router_ip()

Getting traceback because code doesn't check whether there are any elements of fixed_ips before trying to return the first fixed IP for the 'first' port of a router.

Given a router such as:

jujumanage@maas-tele2-vno1:~$ openstack port list --router 7334420e-5c27-4eb2-babd-eccc564a391b
+--------------------------------------+------+-------------------+---------------------------------------------------------------------------+--------+
| ID | Name | MAC Address | Fixed IP Addresses | Status |
+--------------------------------------+------+-------------------+---------------------------------------------------------------------------+--------+
| 287d6b20-7b95-403b-9357-4200eb3f8242 | | fa:16:3e:44:a4:a6 | | DOWN |
| 3ba11b13-5df5-4f4e-a5a3-d96b043ae0c5 | | fa:16:3e:44:17:26 | ip_address='10.81.13.9', subnet_id='9a8d4077-c9d4-4316-bc62-b3ff6b037a9b' | ACTIVE |
| 3fa1ad4b-84d2-4e44-b27d-84b2c7b2166c | | fa:16:3e:16:88:4d | ip_address='10.81.13.1', subnet_id='e5cbcb81-84f3-49fd-b4d4-64f10542419b' | ACTIVE |
+--------------------------------------+------+-------------------+---------------------------------------------------------------------------+--------+

We see the prometheus-openstack-exporter metrics provides a 500 error with this traceback (found with strace -f -s 9999 -p :

[pid 4893] sendto(13, "Traceback (most recent call last):
File "/snap/prometheus-openstack-exporter/25/bin/prometheus-openstack-exporter", line 615, in do_GET
swift.get_stats() +
File "/snap/prometheus-openstack-exporter/25/bin/prometheus-openstack-exporter", line 289, in get_stats
ips.update(self.get_router_ips())
File "/snap/prometheus-openstack-exporter/25/bin/prometheus-openstack-exporter", line 254, in get_router_ips
if self._get_router_ip(r[id]):
File "/snap/prometheus-openstack-exporter/25/bin/prometheus-openstack-exporter", line 234, in _get_router_ip
return port["fixed_ips"][0]["ip_address"]
IndexError: list index out of range

It appears that one should also exclude "DOWN" ports, and make a sanity check that there is any length of the port["fixed_ips"] before iterating it.

Client & server threads block each other due to incorrect eventlet/greenlet imports

Currently, slow running OpenStack API Requests (either stuck connecting or still waiting for the actual response) from the periodic DataGatherer task will block the HTTPServer connections from being processed.

The reverse is also true, a stalled client of the HTTPServer (e.g. opening a telnet session and not sending a request) will also block both the DataGatherer task and processing of other HTTPServer connections.

Observed Symptoms

Slow or failed prometheus requests
Statistics not being updated as often as you would expect
HTTP 500 responses and BrokenPipeError tracebacks being logged due to later trying to respond to prometheus clients which timed out and disconnected the socket

Cause

This happens because in the current code, we are intending to use the eventlet library for asynchronous non-blocking I/O, but, we are not using it correctly.

All code within the main application and all imported dependencies must import the special eventlet "green" versions of many python libraries (e.g. socket, time, threading, SimpleHTTPServer, etc) which yield to other green threads when they would have blocked waiting for I/O or to sleep. Currently this is not always done, as a result we often block other tasks from running.

In the past we also tried to use a threaded/forked model and avoid eventlet, however the python cinderclient library imports the green eventlet.sleep (unknown to us, I believe this is a bug) and thus we would sometimes get the error "greenlet.error: cannot switch to a different thread".

Fix

Fix this by ensuring the entire application is correctly using eventlet and green patched functions by importing eventlet and using eventlet.patcher.monkey_patch() before importing any other modules. This will automatically intercept every other import and always load the green version of a library.

Testing

To test we now have a working solution, you can

Block access to the Nova API (causes connect to hang for 120 seconds) using this firewall command:
iptables -I OUTPUT -p tcp -m state --state NEW --dport 8774 -j DROP
Make many concurrent and repeated requests using siege:
while true; do siege http://172.16.0.30:9183/metrics -t 5s -c 5 -d 0.1; done

When testing with these changes, I never see us block a server or client connection and all requests take a few milliseconds at most, whether or not the client requests are slow or we open a connection to the server that doesn't send a request.

History Lesson

There have been multiple incorrect attempts to solve this and some related problems. To try and avoid any further such problems, I have comprehensively documented the historical issues and why those fixes have not worked below, both for my understanding and yours :)

eventlet implements asynchronous "non-blocking" socket I/O without any code changes to the application and without using real pthreads by using co-operative "green threads" from the greenlet library.

For this to work correctly, greenlet needs to replace many python standard libraries (e.g. socket, time, threading) with an alternative "green" implementation which intentionally yields execution to other green threads anytime it's expected to block such as when reading data from a file/socket or sleeping.
All code both within the application and all imported dependencies must import these special versions, any code that doesn't won't yield cooperatively and will block other green threads whenever such a blocking function is called.

This does not happen automatically, you can find the full details at https://eventlet.readthedocs.io/en/latest/patching.html but as a brief summary this can be done with 3 different methods:
1. Explicitly importing all relevant modules from eventlet.green (both in the application and all dependencies)
2. Automatically during a single import with eventlet.patcher.import_patched - this must be used for every import in the main application
3. Automatically for all future imports by calling eventlet.patcher.monkey_patch before any other imports. This is the most practical option, as the rest of the code in both the application and it's dependencies can remain unmodified and are magically intercepted to import the eventlet.green version
The original Issue #112 found that the process deadlocked with the following error: greenlet.error: cannot switch to a different thread

At the time, we used a native Python Thread for the DataGatherer class and separately used the ForkingHTTPServer to allow both functions to operate simultaneously with real threads/processes.

We did not intend to use eventlet/green threads at all, however, the python-cinderclient library incorrectly imports eventlet.sleep which results in sometimes using green threads accidentally, hence the error.

We attempted to fix that in #115 by importing the green version of threading.Thread explicitly. This avoided the "cannot switch to a different thread" issue by only using green threads and not mixing Python threads and green threads in the same process.
After merging #115 it was found that the HTTPServer loop never co-operatively yielded to the DataGatherer's thread and the stats were never updated.

To fix this, #116 imported the green version of socket, asyncore and time and also littered a few sleep(0) calls around to force co-operative yielding at various points.

This solution was not complete, because it only imported the green version of some libraries, in some call paths. Plus hacked in some extra yields here and there.
In #124 we switched from ForkingHTTPServer to the normal HTTPServer because sometimes it would fork too many servers and hit the process or system-wide process limit.

Though not noted elsewhere, when I reproduce this issue by connecting many clients using the tool siege to a server where I firewalled the nova API connections, I can see that all of those processes are defunct and not actually alive. This is most likely because the process is blocked and the calls to waitpid which would reap them never happen.

Since we are not using the eventlet version of http.server.HTTPServer, without the forked model, we now block anytime we are handling a server request.

Additionally, anytime the DataGatherer green thread calls out through the OpenStack API libraries, it uses non-patched versions of socket/requests/urllib3 and also blocks the HTTPServer which is now inside the same process.

cache_file needs to be initialized on fresh deploys

root@juju-8c40ac-1-lxd-11:~# curl 127.0.0.1:9183/metrics
Traceback (most recent call last):
File "/snap/prometheus-openstack-exporter/25/bin/prometheus-openstack-exporter", line 607, in do_GET
neutron = Neutron()
File "/snap/prometheus-openstack-exporter/25/bin/prometheus-openstack-exporter", line 220, in init
with open(config['cache_file'], 'rb') as f:
IOError: [Errno 2] No such file or directory: '/var/snap/prometheus-openstack-exporter/common/vno1'

If file doesn't exist, it should be initialized:
https://github.com/CanonicalLtd/prometheus-openstack-exporter/blob/master/prometheus-openstack-exporter#L220

Logging fails

I downloaded the last version of the exporter and with the following admin.novarc:

export OS_USERNAME=admin
export OS_PROJECT_NAME=admin
export OS_PASSWORD=xxxxxxxx
export OS_AUTH_URL=https://xxxxx.xxxxxx:13000/v3
export OS_USER_DOMAIN_NAME=default
export OS_PROJECT_DOMAIN_NAME=default
export OS_IDENTITY_API_VERSION=3
export COMPUTE_API_VERSION=1.1
export NOVA_VERSION=1.1

The server correctly starts listening on port 9183, but, each time I try to gather the metrics on the /metrics endpoint, I receive the following error:

----------------------------------------
Exception happened during processing of request from ('192.168.0.4', 52550)
Traceback (most recent call last):
File "/usr/lib64/python2.7/SocketServer.py", line 568, in process_request
self.finish_request(request, client_address)
File "/usr/lib64/python2.7/SocketServer.py", line 334, in finish_request
self.RequestHandlerClass(request, client_address, self)
TypeError: 'SysLogHandler' object is not callable
----------------------------------------

It seems there is a problem with the Log, and checking the commits I realized logging was added on 30th November.

If I checkout the previous commit:

git checkout 3138aa9

The exporter works as expected.

Missing dependecy in the documentation

Hi,
trying to install this in docker ubuntu:16.04 image.
Starting the script interactively it is complaining because of a missing dependency:

python-cinderclient

I think the documentation should be updated with this information on requirement section.

Internal server error 500 when visiting metrics

After finally get the auth problems sorted out i get this error:

Traceback (most recent call last):
File "./prometheus-openstack-exporter", line 606, in do_GET
swift.get_stats() +
File "./prometheus-openstack-exporter", line 492, in get_stats
self.gen_hypervisor_stats()
File "./prometheus-openstack-exporter", line 391, in gen_hypervisor_stats
cpu_info = json.loads(cpu_info)
File "/usr/lib/python2.7/json/init.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

I have commented out the swift section from the example config but it still looks like its trying to get swift stats.

From logs:

Starting data gather thread
Client setup done, keystone ver 3
Error getting tenants.list, continue with projects.list
Done dumping stats to /var/cache/prometheus-openstack-exporter/mycloud
172.17.0.1 - - [09/Feb/2018 09:52:21] "GET /metrics HTTP/1.1" 500 -

Forbidden: You are not authorized to perform the requested action: identity:list_projects.

Using the latest build of March 13.

Followed the guide as instructed, Nova Volumes = False, etc

When I run openstack exporter service I get the following error:

   Loaded: loaded (/etc/systemd/system/prometheus-openstack-exporter.service; disabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-03-27 12:02:19 UTC; 1s ago
 Main PID: 48211 (python)
    Tasks: 2
   Memory: 67.4M
      CPU: 553ms
   CGroup: /system.slice/prometheus-openstack-exporter.service
           └─48211 python /opt/prometheus-openstack-exporter/prometheus-openstack-exporter /etc/prometheus-openstack-exporter/prometheus-ope

Mar 27 12:02:20 B-05-37-openstack-ctl sh[48211]:     return self.request(url, 'GET', **kwargs)
Mar 27 12:02:20 B-05-37-openstack-ctl sh[48211]:   File "/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 331, in request
Mar 27 12:02:20 B-05-37-openstack-ctl sh[48211]:     resp = super(LegacyJsonAdapter, self).request(*args, **kwargs)
Mar 27 12:02:20 B-05-37-openstack-ctl sh[48211]:   File "/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 98, in request
Mar 27 12:02:20 B-05-37-openstack-ctl sh[48211]:     return self.session.request(url, method, **kwargs)
Mar 27 12:02:20 B-05-37-openstack-ctl sh[48211]:   File "/usr/lib/python2.7/dist-packages/positional/__init__.py", line 94, in inner
Mar 27 12:02:20 B-05-37-openstack-ctl sh[48211]:     return func(*args, **kwargs)
Mar 27 12:02:20 B-05-37-openstack-ctl sh[48211]:   File "/usr/lib/python2.7/dist-packages/keystoneauth1/session.py", line 467, in request
Mar 27 12:02:20 B-05-37-openstack-ctl sh[48211]:     raise exceptions.from_response(resp, method, url)
Mar 27 12:02:20 B-05-37-openstack-ctl sh[48211]: Forbidden: You are not authorized to perform the requested action: identity:list_projects.

However I've put in the keystone admin v3 credentials into the admin.novarc. Admin is suppose to see everything right?

Error when running

Good day

The guide seems heavily outdated - if someone can assist. When I manually execute it I get the following error:

Any ideas?

sudo ./prometheus-openstack-exporter prometheus-openstack-exporter.yaml
/usr/local/lib/python2.7/dist-packages/requests/__init__.py:83: RequestsDependencyWarning: Old version of cryptography ([1, 2, 3]) may cause slowdown.
  warnings.warn(warning, RequestsDependencyWarning)
Starting data gather thread
Error getting stats: Traceback (most recent call last):
  File "./prometheus-openstack-exporter", line 194, in run
    keystone, nova, neutron, cinder = get_clients()
  File "./prometheus-openstack-exporter", line 80, in get_clients
    "auth_url")
  File "./prometheus-openstack-exporter", line 60, in get_creds_list
    return [env['OS_%s' % name.upper()] for name in names]
  File "/usr/lib/python2.7/UserDict.py", line 40, in __getitem__
    raise KeyError(key)
KeyError: 'OS_USERNAME'

^CTraceback (most recent call last):
  File "./prometheus-openstack-exporter", line 685, in <module>
    server.serve_forever()
  File "/usr/lib/python2.7/SocketServer.py", line 231, in serve_forever
    poll_interval)
  File "/usr/lib/python2.7/SocketServer.py", line 150, in _eintr_retry
    return func(*args)
KeyboardInterrupt

export nova service-list

It would be useful if p-o-e exported the contents of nova service-list so that we can use the information for trending and alerting.

In particular, we currently have some compute hosts whose nova-compute service stops responding, and the recipient of the nagios alert first has to run "nova service-list" and grep out the bogus service.

With this information exported, we could easily replace this nagios check with an alert that tells the recipient exactly what is broken.

_get_nova_info() fails with BadRequest: marker [$uuid] not found (HTTP 400)

Hi,

p-o-e crashes when what looks like a marker instance ID doesn't exist:

Oct 21 22:52:51 sliggoo prometheus-openstack-exporter.prometheus-openstack-exporter[1337972]: Error getting stats: Traceback (most recent call last):
Oct 21 22:52:51 sliggoo prometheus-openstack-exporter.prometheus-openstack-exporter[1337972]: File "/snap/prometheus-openstack-exporter/29/bin/prometheus-openstack-exporter", line 206, in run
Oct 21 22:52:51 sliggoo prometheus-openstack-exporter.prometheus-openstack-exporter[1337972]: prodstack.update(self._get_nova_info(nova, cinder, prodstack))
Oct 21 22:52:51 sliggoo prometheus-openstack-exporter.prometheus-openstack-exporter[1337972]: File "/snap/prometheus-openstack-exporter/29/bin/prometheus-openstack-exporter", line 176, in _get_nova_info
Oct 21 22:52:51 sliggoo prometheus-openstack-exporter.prometheus-openstack-exporter[1337972]: new_instances = [x._info for x in nova.servers.list(search_opts=search_opts)]
Oct 21 22:52:51 sliggoo prometheus-openstack-exporter.prometheus-openstack-exporter[1337972]: File "/snap/prometheus-openstack-exporter/29/lib/python2.7/site-packages/novaclient/v2/servers.py", line 835, in list
Oct 21 22:52:51 sliggoo prometheus-openstack-exporter.prometheus-openstack-exporter[1337972]: "servers")
Oct 21 22:52:51 sliggoo prometheus-openstack-exporter.prometheus-openstack-exporter[1337972]: File "/snap/prometheus-openstack-exporter/29/lib/python2.7/site-packages/novaclient/base.py", line 249, in _list
Oct 21 22:52:51 sliggoo prometheus-openstack-exporter.prometheus-openstack-exporter[1337972]: resp, body = self.api.client.get(url)
Oct 21 22:52:51 sliggoo prometheus-openstack-exporter.prometheus-openstack-exporter[1337972]: File "/snap/prometheus-openstack-exporter/29/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 386, in get
Oct 21 22:52:51 sliggoo prometheus-openstack-exporter.prometheus-openstack-exporter[1337972]: return self.request(url, 'GET', **kwargs)
Oct 21 22:52:51 sliggoo prometheus-openstack-exporter.prometheus-openstack-exporter[1337972]: File "/snap/prometheus-openstack-exporter/29/lib/python2.7/site-packages/novaclient/client.py", line 117, in request
Oct 21 22:52:51 sliggoo prometheus-openstack-exporter.prometheus-openstack-exporter[1337972]: raise exceptions.from_response(resp, body, url, method)
Oct 21 22:52:51 sliggoo prometheus-openstack-exporter.prometheus-openstack-exporter[1337972]: BadRequest: marker [85e75503-e85e-4e19-b13b-4245c31e88a3] not found (HTTP 400) (Request-ID: req-4fa51fcf-2706-4695-84cc-3d2f358bca82)

Not sure what the cause is.

Multiple aggregates are not reflected. aggregate_map gets overwritten

When a hypervisor has multiple aggregate groups set, only the last one is reflected in exporter output. Here is proposed fix, which works for us and returns multiple comma seperated aggregates:

class Nova():
    def __init__(self):
        self.registry = CollectorRegistry()
        self.prodstack = {}
        with open(config['cache_file'], 'rb') as f:
            self.prodstack = pickle.load(f)[0]
        self.hypervisors = self.prodstack['hypervisors']
        self.tenant_map = {t['id']: t['name'] for t in self.prodstack['tenants']}
        self.flavor_map = {f['id']: {'ram': f['ram'], 'disk': f['disk'], 'vcpus': f['vcpus']}
                           for f in self.prodstack['flavors']}
        self.aggregate_list= []
        self.aggregate_map = {}
        self.services_map = {}
        for s in self.prodstack['services']:
            if s['binary'] == 'nova-compute':
                self.services_map[s['host']] = s['status']
        for agg in self.prodstack['aggregates']:
            for h in agg['hosts']:
                self.aggregate_list.append({h: agg['name']})
        for i in self.aggregate_list:
            for key in i:
                self.aggregate_map.setdefault(key, [])
                if not i[key] in self.aggregate_map[key]:
                   self.aggregate_map[key].append(i[key])
        for i in self.aggregate_map.keys():
            self.aggregate_map[i] = ",".join(self.aggregate_map[i])

Charm and exporter are lacking support of OS_INTERFACE to specify endpoint to be used

Problem statement: Exporter cannot reach to Keystone or another endpoints from catalog list, if their public endpoints is unaccessible from exporter - and there is no way to customize desired endpoint.
Example: Exporter is deployed on container which contains only "internal" network, so even if it will reach Keystone on its internal FQDN/IP instead of public one - it will not be able to connect to another services since lack of connectivity on public endpoints.

As prometheus-openstack-exporter uses openstack client, when it queries keystone catalog it retrieves "public" endpoints by default (openstack/osc-lib@c500b63) even if an admin keystone endpoint is specified in the config. To override this behavior, OS_INTERFACE needs to be passed to openstack-client and this is currently not supported neither by the prometheus-openstack-exporter nor by the exporter itself.

Currently, the only way to avoid this is expose exporter on public interface, which can be unacceptable in certain circumstances.

Socket errors causing frequent exporter restarts

We have noticed the below trackback in the syslog of the openstack exporter, which appears quite frequently:

Traceback (most recent call last):
File "/snap/prometheus-openstack-exporter/24/usr/lib/python2.7/SocketServer.py", line 571, in process_request
self.finish_request(request, client_address)
File "/snap/prometheus-openstack-exporter/24/usr/lib/python2.7/SocketServer.py", line 331, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/snap/prometheus-openstack-exporter/24/bin/prometheus-openstack-exporter", line 632, in handler
OpenstackExporterHandler(*args, **kwargs)
File "/snap/prometheus-openstack-exporter/24/bin/prometheus-openstack-exporter", line 593, in init
BaseHTTPRequestHandler.init(self, *args, **kwargs)
File "/snap/prometheus-openstack-exporter/24/usr/lib/python2.7/SocketServer.py", line 654, in init
self.finish()
File "/snap/prometheus-openstack-exporter/24/usr/lib/python2.7/SocketServer.py", line 713, in finish
self.wfile.close()
File "/snap/prometheus-openstack-exporter/24/usr/lib/python2.7/socket.py", line 283, in close
self.flush()
File "/snap/prometheus-openstack-exporter/24/usr/lib/python2.7/socket.py", line 307, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe

It seems that the exporter is trying to talk to a machine that is offline, and does not handle the error gracefully which causes it to crash.

'greenlet.error: cannot switch to a different thread' stacktrace

current stable 0.1.7 [1] stopped, with stacktrace "greenlet.error: cannot switch to a different thread" https://pastebin.ubuntu.com/p/ngxDcF8qzZ/

miguelgrinberg/Flask-SocketIO#65
seems to indicate that it's a side effect of the removal of the gevent monkey-patching (with a potential fix)

[1] $ snap list |grep prometheus-openstack-exporter
prometheus-openstack-exporter 0.1.7 34 latest/stable peppepetra86 -
$

Cant get openstack metrics

prometheus-openstack-exporter.service - prometheus-openstack-exporter
Loaded: loaded (/etc/systemd/system/prometheus-openstack-exporter.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2019-10-05 12:20:34 PST; 4s ago
Main PID: 18478 (python)
Tasks: 2
Memory: 44.4M
CGroup: /system.slice/prometheus-openstack-exporter.service
└─18478 python /usr/local/bin/prometheus-openstack-exporter

Oct 05 12:20:35 localhost.localdomain sh[18478]: Error getting stats: Traceback (most recent call last):
Oct 05 12:20:35 localhost.localdomain sh[18478]: File "/usr/local/bin/prometheus-openstack-exporter", line 145, in run
Oct 05 12:20:35 localhost.localdomain sh[18478]: keystone, nova, neutron, cinder = get_clients()
Oct 05 12:20:35 localhost.localdomain sh[18478]: File "/usr/local/bin/prometheus-openstack-exporter", line 80, in get_clients
Oct 05 12:20:35 localhost.localdomain sh[18478]: "auth_url")
Oct 05 12:20:35 localhost.localdomain sh[18478]: File "/usr/local/bin/prometheus-openstack-exporter", line 60, in get_creds_list
Oct 05 12:20:35 localhost.localdomain sh[18478]: return [env['OS_%s' % name.upper()] for name in names]
Oct 05 12:20:35 localhost.localdomain sh[18478]: File "/usr/lib64/python2.7/UserDict.py", line 23, in getitem
Oct 05 12:20:35 localhost.localdomain sh[18478]: raise KeyError(key)
Oct 05 12:20:35 localhost.localdomain sh[18478]: KeyError: 'OS_USERNAME'

pickle EOF Error

When I run the prometheus-openstack-exporter I've this message:

Traceback (most recent call last):
File "/home/ubuntu/prometheus-openstack-exporter-canonical/prometheus-openstack-exporter", line 727, in do_GET
collectors = [COLLECTORScollector for collector in config['enabled_collectors']]
File "/home/ubuntu/prometheus-openstack-exporter-canonical/prometheus-openstack-exporter", line 326, in init
self.prodstack = pickle.load(f)[0]
File "/usr/lib/python2.7/pickle.py", line 1384, in load
return Unpickler(file).load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatchkey
File "/usr/lib/python2.7/pickle.py", line 886, in load_eof
raise EOFError
EOFError

I don't know how to correct it. Is there someone who can help me ?

I can't get any metric

Hello

I have been tested this exporter in a machine with an openstack and in this machine, this exporter has worked fine. This openstack is a test environment with only one computer.

Then, I have installed in our production openstack environment without success, and I'm not able to see the problem. I can't get any metric. In this openstack environment we have 3 controller servers:

controller2: manage the network
controller3: manage the storage
controller1: manage the rest of the openstack components

I have installed the exporter in the controller1 with a dedicated openstack user to avoid use the admin user.

If from the own controller1 machine I try to get the metric I can see an 500 error:

# curl -v http://localhost:9183/metrics
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 9183 (#0)
> GET /metrics HTTP/1.1
> Host: localhost:9183
> User-Agent: curl/7.61.1
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 500 Internal Server Error
< Server: BaseHTTP/0.6 Python/3.6.8
< Date: Thu, 28 Jul 2022 11:26:11 GMT
< 
* Closing connection 0
[root@nccontroller1 log]#

This is the content of the prometheus-openstack-exporter.yaml file (although we have swift, it is commented just as a test to check if it was the problem):

[root@controller1 ~]# cat /etc/prometheus-openstack-exporter/prometheus-openstack-exporter.yaml 
# Example configuration file for prometheus-openstack-exporter
# Copyright (C) 2016-2019 Canonical, Ltd.
#

listen_port: 9183
cache_refresh_interval: 300  # In seconds
cache_file: /var/cache/prometheus-openstack-exporter/mycloud
cloud: mycloud
openstack_allocation_ratio_vcpu: 2.5
openstack_allocation_ratio_ram: 1.1
openstack_allocation_ratio_disk: 1.0
log_level: DEBUG

# Configure the enabled collectors here.  Note that the Swift account
# collector in particular has special requirements.
enabled_collectors:
  - cinder
  - neutron
  - nova
###  - swift
###  - swift-account-usage

# To export hypervisor_schedulable_instances metric set desired instance size
schedulable_instance_size:
    ram_mbs: 4096
    vcpu: 2
    disk_gbs: 20

# Uncomment if the cloud doesn't provide cinder / nova volumes:
#use_nova_volumes: False

## Swift

# There is no way to retrieve them using OpenStack APIs
# For clouds deployed without swift, remove this part
###swift_hosts:
###    - swift.xxx.es 

###    - export1 172.16.4.225:8080
###    - export2 172.16.4.226:8080
###    - export3 172.16.4.227:8080
###    - export4 172.16.4.228:8080

# There is no API to ask Swift for a list of accounts it knows about.
# Even if there were, Swift (in common case of Keystone auth, at
# least) only knows them by the corresponding tenant ID, which would
# be a less than useful label without post-processing.  The following
# should point to a file containing one line per tenant, with the
# tenant name first, then whitespace, followed by the tenant ID.
keystone_tenants_map:

# The reseller prefix is typically used by the Swift middleware to
# keep accounts with different providers separate.  We would ideally
# look this up dynamically from the Swift configuration.
# The Keystone middlware defaults to the following value.
reseller_prefix: AUTH_

ring_path: /etc/swift

# These will typically be read from /etc/swift/swift.conf.  If that
# file cannot be opened, then the Swift library will log an error and
# try to exit.  To run p-s-a-e as a user other than Swift, these
# settings must be set to the same values as Swift itself, and the
# above must point to an always-current readable copy of the rings.

hash_path_prefix:
hash_path_suffix:

[root@controller1 ~]#

This is the content of the admin.novarc file

[root@controller1 ~]# cat /etc/prometheus-openstack-exporter/admin.novarc 
export OS_PROJECT_DOMAIN_NAME=Default
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_NAME=admin
#export OS_TENANT_NAME=admin
export OS_USERNAME=prometheus-exporter
export OS_PASSWORD=XXXXXXXX
export OS_AUTH_URL=https://xxxxx.xxx.xx:5000/v3
#export OS_INTERFACE=public
export OS_IDENTITY_API_VERSION=3
export OS_REGION_NAME=RegionOne
[root@controller1 ~]#

And this is the log info when the service is running, nothing happens when I try to get the metrics. Note that in the development openstack, we get the same problem with the list_projects but it works fine, we can get the metrics.

Jul 28 13:28:51 nccontroller1 systemd[1]: Started prometheus-openstack-exporter.
Jul 28 13:28:53 nccontroller1 python3[32042]: Starting data gather thread
Jul 28 13:28:53 nccontroller1 python3[32042]: Client setup done, keystone ver 3
Jul 28 13:28:53 nccontroller1 python3[32042]: Error getting tenants.list, continue with projects.list
Jul 28 13:28:53 nccontroller1 python3[32042]: Error getting stats: Traceback (most recent call last):
                                                File "/opt/prometheus-openstack-exporter/prometheus-openstack-exporter", line 186, in _get_keystone_info
                                                  info["tenants"] = [x._info for x in keystone.tenants.list()]
                                                File "/usr/lib/python3.6/site-packages/keystoneclient/httpclient.py", line 893, in __getattr__
                                                  raise AttributeError(_("Unknown Attribute: %s") % name)
                                              AttributeError: Unknown Attribute: tenants
                                              
                                              During handling of the above exception, another exception occurred:
                                              
                                              Traceback (most recent call last):
                                                File "/opt/prometheus-openstack-exporter/prometheus-openstack-exporter", line 262, in run
                                                  prodstack.update(self._get_keystone_info(keystone))
                                                File "/opt/prometheus-openstack-exporter/prometheus-openstack-exporter", line 189, in _get_keystone_info
                                                  info["tenants"] = [x._info for x in keystone.projects.list()]
                                                File "/usr/lib/python3.6/site-packages/keystoneclient/v3/projects.py", line 142, in list
                                                  **kwargs)
                                                File "/usr/lib/python3.6/site-packages/keystoneclient/base.py", line 86, in func
                                                  return f(*args, **new_kwargs)
                                                File "/usr/lib/python3.6/site-packages/keystoneclient/base.py", line 448, in list
                                                  list_resp = self._list(url_query, self.collection_key)
                                                File "/usr/lib/python3.6/site-packages/keystoneclient/base.py", line 141, in _list
                                                  resp, body = self.client.get(url, **kwargs)
                                                File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 386, in get
                                                  return self.request(url, 'GET', **kwargs)
                                                File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 545, in request
                                                  resp = super(LegacyJsonAdapter, self).request(*args, **kwargs)
                                                File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 248, in request
                                                  return self.session.request(url, method, **kwargs)
                                                File "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 943, in request
                                                  raise exceptions.from_response(resp, method, url)
                                              keystoneauth1.exceptions.http.Forbidden: You are not authorized to perform the requested action: identity:list_projects. (HTTP 403) (Request-ID: req-c1121018-2efb-4768-bddb-cfdd6558822f)

Tell me if you need I upload more information.

Any help to find the problem would be appreciated. Thanks a lot.

Can not configure the listener address or interface for the webserver

Currently the webservice seems to bind to eth0 on my systems. Please add a way to configure the listener ip/device in the yaml configuration.

handle missing flavors

Currently when any flavor is missing the exporter falls back to exporting no information on resource usage.

We recently had to delete and recreate some flavors and as a result a bunch of our clouds are exporting no information.

For our purposes I think we'd like some information rather than no information. The flavors in question are used only by one tenant and so the other tenants' usage can be calculated.

My proposal (hopefully I will find time to work on this) would be to handle missing flavors by calculating metrics as usual and exporting an additional metric for each tenant counting how many instances' flavors could not be mapped, and perhaps a count of unmappable flavors.

This could be opt-in/experimental to begin with.

Reliance on newer python-novaclient dependency than what is specified in readme

prometheus-openstack-exporter passes the detail keyword argument when getting Nova quotas (link to code). Support for getting quota details was added to the python-novaclient API in June of 2016 (commit 0b2de530053d93cdd9dc4ea08c482325321f7ff7).

It seems that Ubuntu 16.04 ships with Nova client 2:3.3.1-2 (what I have after apt-get install python-novaclient). This is not new enough to support getting quota details, to wit:

Jan 17 07:50:38 prometheus sh[22592]: Error getting stats: Traceback (most recent call last):
Jan 17 07:50:38 prometheus sh[22592]:   File "/opt/prometheus/prometheus-openstack-exporter-marana-cloud/prometheus-openstack-exporter", line 178, in run
Jan 17 07:50:38 prometheus sh[22592]:     prodstack['nova_quotas'][tid] = nova.quotas.get(tid, detail=True)._info
Jan 17 07:50:38 prometheus sh[22592]: TypeError: get() got an unexpected keyword argument 'detail'

Suggested fix: update readme to provide a path to success on Ubuntu 16.04 (e.g. set up a virtualenv with a newer python-novaclient), and/or make the exporter degrade gracefully if an older python-novaclient (which does not support the quota details kwarg) is present.

can i deploy this in centos7?

cache file issue: No such file or directory: '/home/prometheus-openstack-exporter/mycloud'

Hi All,
when I tried to start the program (dowload the last release at 25 january) for the first time, on the browser I can see this error:
"Traceback (most recent call last):
File "./prometheus-openstack-exporter", line 729, in do_GET
collectors = [COLLECTORScollector for collector in config['enabled_collectors']]
File "./prometheus-openstack-exporter", line 327, in init
with open(config['cache_file'], 'rb') as f:
IOError: [Errno 2] No such file or directory: '/home/prometheus-openstack-exporter/mycloud'"

I think this issue is derived from an endless loop ( that not allow the writing of "mycloud" file ) in the file "prometheus-openstack-exporter", "_get_nova_info" function, line 179.
The value of marker is written in search_opts['marker'] but then (line 176) the value is overwritten by "marker" value that is empty.
So the loop cannot gets an end.
I modified the line 179
FROM
search_opts['marker'] = new_instances[-1]['id']
TO
marker = new_instances[-1]['id']

Now it seems working

Latest p-o-e crashes with enabled_collectors

Hi,

Seems the latest p-o-e is crashing.

Aug 12 01:08:04 comet prometheus-openstack-exporter.prometheus-openstack-exporter[48365]: Traceback (most recent call last):
Aug 12 01:08:04 comet prometheus-openstack-exporter.prometheus-openstack-exporter[48365]: File "/snap/prometheus-openstack-exporter/27/bin/prometheus-openstack-exporter", line 780, in
Aug 12 01:08:04 comet prometheus-openstack-exporter.prometheus-openstack-exporter[48365]: if data_gatherer_needed(config):
Aug 12 01:08:04 comet prometheus-openstack-exporter.prometheus-openstack-exporter[48365]: File "/snap/prometheus-openstack-exporter/27/bin/prometheus-openstack-exporter", line 717, in data_gatherer_needed
Aug 12 01:08:04 comet prometheus-openstack-exporter.prometheus-openstack-exporter[48365]: return set(config['enabled_collectors']).intersection(DATA_GATHERER_USERS)
Aug 12 01:08:04 comet prometheus-openstack-exporter.prometheus-openstack-exporter[48365]: KeyError: 'enabled_collectors'

Certificates error when using untrusted certificates

(Caused by SSLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed.

Getting this error while running. I'm not using SSL for Horizon dashboard and usually access API through --insecure flag. where should I do changes in code.

metrics not inputting to prometheus

All appears up and running . can browse to localhost:9090 Prometheus no issue, can view metrics localhost:9093 no issue and can start exporter no issue, when I look into graphs for metric inputs I do not have the option for openstack inputs , just the 3 , up scrape per second, scrape samples.

scrape_configs:

job_name: 'openstack-deployment-1'
scrape_interval: 5m
Static_configs:
- targets: ['localhost:9183']

Readme issue

The readme says sudo start prometheus-openstack-exporter, but shouldn't it be sudo systemctl start prometheus-openstack-exporter?

aggregate label is problematic for hosts which belong to multiple aggregates

We've found an issue regarding metrics, observed concretely on the hypervisor_schedulable_instances metric but likely affecting others, where a metric only shows up under one of the aggregates that a host is a part of.

For example, if a host is part of aggregates a, b, and c, if I were to search for hypervisor_schedulable_instances{hypervisor_hostname="my-host.example.com"}, I might get output that looks like:

Key	Value
`hypervisor_schedulable_instances{aggregate="a", arch="x86_64", cloud="my-cloud", hypervisor_hostname="my-host.example.com"}`	175

That is - it would give me a single record indicating 175 schedulable instances on my host. However, it would only show the record under aggregate "a". If I were to change my query to sum/avg over aggregates "b" or "c", the above record wouldn't even be included in the calculation.

I'm not a Prometheus guru so I don't have a suggestion for how to correct this in a sane way; adding duplicate records with different aggregate labels would mess up sum()/avg() over the entire collection, but I am not sure of how else to correct this... Or should per-aggregate metrics be collected in some alternative way? I don't know; I leave that for you to review and consider.

Thank you.

error mapping flavor_id from my Openstack deployment to the exporter.

I have an issue when get metrics from url: http://x.x.x.x:9183/metrics:

Traceback (most recent call last):
File "./prometheus-openstack-exporter", line 363, in do_GET
nova.get_stats() +
File "./prometheus-openstack-exporter", line 290, in get_stats
self.gen_instance_stats()
File "./prometheus-openstack-exporter", line 270, in gen_instance_stats
flavor = self.flavor_map[i['flavor']['id']]
KeyError: u'3

That seems to be an error mapping flavor_id from Openstack deployment to the exporter. any solution to fix it?

pickle EOF Error

Running the latest version (end January 2019)

On the node metrics page I get the follow error:

Traceback (most recent call last): File "/opt/prometheus-openstack-exporter/prometheus-openstack-exporter", line 727, in do_GET collectors = [COLLECTORS[collector]() for collector in config['enabled_collectors']] File "/opt/prometheus-openstack-exporter/prometheus-openstack-exporter", line 326, in __init__ self.prodstack = pickle.load(f)[0] File "/usr/lib/python2.7/pickle.py", line 1384, in load return Unpickler(file).load() File "/usr/lib/python2.7/pickle.py", line 864, in load dispatch[key](self) File "/usr/lib/python2.7/pickle.py", line 886, in load_eof raise EOFError EOFError

Running the code manually, no container / charm, etc.

Not compatible with trusty python-novaclient

When trying to run head on a trusty system (e.g. beverly):

Traceback (most recent call last):
File "prometheus-openstack-exporter", line 76, in run
nova = nova_client.Client(2, **creds_nova)
File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 506, in Client
return client_class(*args, **kwargs)
TypeError: init() takes at least 4 arguments (6 given)

python-novaclient 1:2.17.0-0ubuntu1.2
python-keystoneclient 1:0.7.1-ubuntu1.3

Docker container broken after Python 3 upgrade

I believe #110 broke the Docker container, which is still based on python:2.7-alpine.

Minimal fix might be:

Dockerfile should have FROM python:3-alpine
Dockerfile should install package python3-dev instead of python2-dev

KeyError: 'tenants' in prometheus-openstack-exporter, line 179, in _get_nova_info

Info dictionary does not include any 'tenants' key because info was initiated at the top of _get_nova_info function as an empty dictionary.

my workaround was to pass podstack as an argument to _get_nova_info in the run() function at line 198.

then I got TypeError: 'NoneType' object is not iterable.
this error happened because _get_nova_info function does not return info at the end.

ImportError: No Module named swift.common.utils

Getting the following issue (downloaded 6 Feb 2019) if anyone can assist please:

ubuntu@server01:/opt/prometheus-openstack-exporter$ sudo systemctl status prometheus-openstack-exporter.service
prometheus-openstack-exporter.service - prometheus-openstack-exporter
   Loaded: loaded (/etc/systemd/system/prometheus-openstack-exporter.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2019-02-06 12:32:15 UTC; 4s ago
  Process: 11416 ExecStart=/bin/sh -c . /etc/prometheus-openstack-exporter/admin.novarc; exec /opt/prometheus-openstack-exporter/prometheus-openstack-exporter $CONFIG_FILE (co
 Main PID: 11416 (code=exited, status=1/FAILURE)

Feb 06 12:32:14 server01 systemd[1]: Started prometheus-openstack-exporter.
Feb 06 12:32:15 server01 sh[11416]: Traceback (most recent call last):
Feb 06 12:32:15 server01 sh[11416]:   File "/opt/prometheus-openstack-exporter/prometheus-openstack-exporter", line 48, in <module>
Feb 06 12:32:15 server01 sh[11416]:     import swift.common.utils
Feb 06 12:32:15 server01 sh[11416]: ImportError: No module named swift.common.utils
Feb 06 12:32:15 server01 systemd[1]: prometheus-openstack-exporter.service: Main process exited, code=exited, status=1/FAILURE
Feb 06 12:32:15  server01 systemd[1]: prometheus-openstack-exporter.service: Unit entered failed state.
Feb 06 12:32:15 server01 systemd[1]: prometheus-openstack-exporter.service: Failed with result 'exit-code'.

ImportError: No module named swift.common.utils is the issue

$ sudo cat /etc/prometheus-openstack-exporter/prometheus-openstack-exporter.yaml

# Copyright (C) 2016-2019 Canonical, Ltd.
#

listen_port: 9183
cache_refresh_interval: 300  # In seconds
cache_file: /var/cache/prometheus-openstack-exporter/mycloud
cloud: mycloud
openstack_allocation_ratio_vcpu: 2.5
openstack_allocation_ratio_ram: 1.1
openstack_allocation_ratio_disk: 1.0

# Configure the enabled collectors here.  Note that the Swift account
# collector in particular has special requirements.
enabled_collectors:
  - cinder
  - neutron
  - nova
#  - swift
#  - swift-account-usage

# To export hypervisor_schedulable_instances metric set desired instance size
schedulable_instance_size:
    ram_mbs: 4096
    vcpu: 2
    disk_gbs: 20

# Uncomment if the cloud doesn't provide cinder / nova volumes:
use_nova_volumes: False

$ sudo nano /etc/init/prometheus-openstack-exporter.conf

# Configuration is read from /etc/default/prometheus-openstack-exporter
# Copyright (C) 2016 Canonical, Ltd.

description "Prometheus Openstack Exporter"
author  "Jacek Nykis <[email protected]>"

# The following variable must be set:
# NOVARC - full path to the novarc file
#
# Optionall variables:
# CONFIG_FILE - path to configuration file

start on runlevel [2345]
stop on runlevel [!2345]
respawn

script
    . /etc/default/prometheus-openstack-exporter
    . $NOVARC
    exec /usr/local/bin/prometheus-openstack-exporter $CONFIG_FILE
#    exec /opt/prometheus-openstack-exporter/prometheus-openstack-exporter $CONFIG_FILE
end script

$ sudo nano /etc/systemd/system/prometheus-openstack-exporter.service

Description=prometheus-openstack-exporter
After=network.target

[Service]
EnvironmentFile=/etc/default/prometheus-openstack-exporter
#EnvironmentFile=/etc/prometheus-openstack-exporter/admin.novarc
ExecStart=/bin/sh -c '. /etc/prometheus-openstack-exporter/admin.novarc; exec /opt/prometheus-openstack-exporter/prometheus-openstack-exporter $CONFIG_FILE'
KillMode=process

[Install]
WantedBy=multi-user.target

nova instance metric do not work for more than 100 VMs

When there are more than 100 VMs the paginated fetch results in an endless loop as the marker is not updated.

        marker = ''
        while True:
            search_opts = {'all_tenants': '1', 'limit': '100', 'marker': marker}
            new_instances = [x._info for x in nova.servers.list(search_opts=search_opts)]
            if new_instances:
                marker = new_instances[-1]['id']
                info['instances'].extend(new_instances)
            else:
                break

Python version is:

/usr/bin/python --version
Python 2.7.12

crashes if a swift node is down

p-o-e running on a swift-proxy crashes if a storage node is down.

We should catch the error and try another node instead.

Traceback (most recent call last):                                                                                                          
  File "/usr/local/bin/prometheus-openstack-exporter", line 702, in do_GET                                                                           
    output += collector.get_stats()                                                                                                         
  File "/usr/local/bin/prometheus-openstack-exporter", line 662, in get_stats                                                            
    self.gen_account_stats()                                                                                                            
  File "/usr/local/bin/prometheus-openstack-exporter", line 657, in gen_account_stats                                                        
    bytes_used = self._get_account_usage(account)                                                                                                       
  File "/usr/local/bin/prometheus-openstack-exporter", line 642, in _get_account_usage                                                          
    response = requests.head(account_url)                                                                                                      
  File "/usr/lib/python2.7/dist-packages/requests/api.py", line 77, in head                                                                      
    return request('head', url, **kwargs)                                                                                                          
  File "/usr/lib/python2.7/dist-packages/requests/api.py", line 44, in request                                                         
    return session.request(method=method, url=url, **kwargs)                                                                                            
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 467, in request                                                                  
    resp = self.send(prep, **send_kwargs)                                                                                                     
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 570, in send                                                           
    r = adapter.send(request, **kwargs)                                                                                                                  
  File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send                                                              
    raise ConnectionError(e)                                                                                                                      
ConnectionError: HTTPConnectionPool(host='10.24.0.222', port=6002): Max retries exceeded with url: /sdh/177890/AUTH_abb26b2fe803453d834071cecdb7bc21 (Caused by <class 'socket.error'>: [Errno 113] No route to host)

Server in deadlock

Hi, I found a deadlock condition in a prometheus-openstack-exporter server running the last snap version. The error is the same as described here for swift with just some line number differences as the server is using python 3.8. This is the stack trace for the error I saw.
Could it be the same issue?

Unreachable swift host breaks exporter

If one of the swift hosts is not reachable no metrics are returned and I see the following traceback:

Traceback (most recent call last):
  File "/usr/local/bin/prometheus-openstack-exporter", line 356, in do_GET
    swift.get_stats() + \
  File "/usr/local/bin/prometheus-openstack-exporter", line 333, in get_stats
    self.gen_disk_usage_stats()
  File "/usr/local/bin/prometheus-openstack-exporter", line 297, in gen_disk_usage_stats
    r = requests.get(self.baseurl.format(h, 'diskusage'))
  File "/usr/lib/python2.7/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 455, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 558, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
    raise ConnectionError(e)
ConnectionError: HTTPConnectionPool(host='redacted', port=6000): Max retries exceeded with url: /recon/diskusage (Caused by <class 'socket.error'>: [Errno 113] No route to host)

The exporter should handle unreachable swift hosts gracefully.

Missing reseller_prefix causes traceback

That's this:

Traceback (most recent call last):
File "/snap/prometheus-openstack-exporter/28/bin/prometheus-openstack-exporter", line 737, in do_GET
collectors = [COLLECTORScollector for collector in get_collectors(config.get('enabled_collectors'))]
File "/snap/prometheus-openstack-exporter/28/bin/prometheus-openstack-exporter", line 637, in init
self.reseller_prefix = config['reseller_prefix']
KeyError: 'reseller_prefix'

Log better

All diagnostic logging is currently done with print().
Ideally, we should be logging to syslog, so that snap logs shows useful diagnostic information.

pickle EOF error

Hi guys,
when trying to access http://localhost:9183/metrics, I get the following:

Traceback (most recent call last):
File "/repo/prometheus-openstack-exporter", line 578, in do_GET
neutron = Neutron()
File "/repo/prometheus-openstack-exporter", line 212, in init
self.prodstack = pickle.load(f)[0]
File "/usr/lib/python2.7/pickle.py", line 1384, in load
return Unpickler(file).load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatchkey
File "/usr/lib/python2.7/pickle.py", line 886, in load_eof
raise EOFError
EOFError

Can you please help?

Thanks,
Roberto

python3 support

Hi there.

Thanks for your work with this package, is really useful to use.

Could you please consider making the required changes to make this software run in a python3 environment? In particular, Debian Bullseye doesn't have python2 by default anymore, and we're having problems running this.

Some examples of things that need changes:

$ 2to3 /usr/bin/prometheus-openstack-exporter
RefactoringTool: Skipping optional fixer: buffer
RefactoringTool: Skipping optional fixer: idioms
RefactoringTool: Skipping optional fixer: set_literal
RefactoringTool: Skipping optional fixer: ws_comma
RefactoringTool: Refactored /usr/bin/prometheus-openstack-exporter
--- /usr/bin/prometheus-openstack-exporter	(original)
+++ /usr/bin/prometheus-openstack-exporter	(refactored)
@@ -28,7 +28,7 @@
 from os import environ as env
 from os import rename, path
 import traceback
-import urlparse
+import urllib.parse
 from threading import Thread
 import pickle
 import requests
@@ -38,9 +38,9 @@
 # http://docs.openstack.org/developer/python-novaclient/api.html
 from cinderclient.v2 import client as cinder_client
 from novaclient import client as nova_client
-from BaseHTTPServer import BaseHTTPRequestHandler
-from BaseHTTPServer import HTTPServer
-from SocketServer import ForkingMixIn
+from http.server import BaseHTTPRequestHandler
+from http.server import HTTPServer
+from socketserver import ForkingMixIn
 from prometheus_client import CollectorRegistry, generate_latest, Gauge, CONTENT_TYPE_LATEST
 from netaddr import IPRange
 
@@ -125,7 +125,7 @@
         cinder = cinder_client.Client(session=sess_admin)
 
     else:
-        raise(ValueError("Invalid OS_IDENTITY_API_VERSION=%s" % ks_version))
+        raise ValueError
     log.debug("Client setup done, keystone ver {}".format(ks_version))
     return (keystone, nova, neutron, cinder)
 
@@ -317,7 +317,7 @@
         metrics = Gauge('neutron_public_ip_usage',
                         'Neutron floating IP and router IP usage statistics',
                         labels, registry=self.registry)
-        for k, v in ips.items():
+        for k, v in list(ips.items()):
             metrics.labels(*k).set(v)
         self.gen_subnet_size()
         return generate_latest(self.registry)
@@ -341,7 +341,7 @@
                     ['cloud', 'tenant', 'type'], registry=self.registry)
         if not self.use_nova_volumes:
             return
-        for t, q in self.prodstack['volume_quotas'].items():
+        for t, q in list(self.prodstack['volume_quotas'].items()):
             if t in self.tenant_map:
                 tenant = self.tenant_map[t]
             else:
@@ -506,7 +506,7 @@
         ram = Gauge('nova_quota_ram_mbs',
                     'Nova RAM (MB)',
                     ['cloud', 'tenant', 'type'], registry=self.registry)
-        for t, q in self.prodstack['nova_quotas'].items():
+        for t, q in list(self.prodstack['nova_quotas'].items()):
             if t in self.tenant_map:
                 tenant = self.tenant_map[t]
             else:
@@ -587,7 +587,7 @@
         try:
             swift_repl_duration.labels(config['cloud'], h, 'object').set(r.json()['object_replication_time'])
         except TypeError:
-            print(traceback.format_exc())
+            print((traceback.format_exc()))
 
     def _get_ring_replication_stats(self, ring, h, swift_repl_duration, swift_repl):
         metrics = ['attempted', 'diff', 'diff_capped', 'empty',
@@ -600,13 +600,13 @@
         try:
             swift_repl_duration.labels(config['cloud'], h, ring).set(r.json()['replication_time'])
         except TypeError:
-            print(traceback.format_exc())
+            print((traceback.format_exc()))
 
         for metric in metrics:
             try:
                 swift_repl.labels(config['cloud'], h, ring, metric).set(r.json()['replication_stats'][metric])
             except TypeError:
-                print(traceback.format_exc())
+                print((traceback.format_exc()))
 
     def gen_replication_stats(self):
         labels = ['cloud', 'hostname', 'ring', 'type']
@@ -689,7 +689,7 @@
         swift_account = Gauge(
             'swift_account_bytes_used', 'Swift account usage in bytes', labels, registry=self.registry)
 
-        for tenant_name, tenant_id in self.keystone_tenants_map.iteritems():
+        for tenant_name, tenant_id in self.keystone_tenants_map.items():
             account = self.reseller_prefix + tenant_id
             bytes_used = self._get_account_usage(account)
 
@@ -738,7 +738,7 @@
         BaseHTTPRequestHandler.__init__(self, *args, **kwargs)
 
     def do_GET(self):
-        url = urlparse.urlparse(self.path)
+        url = urllib.parse.urlparse(self.path)
         if url.path == '/metrics':
             try:
                 collectors = [COLLECTORS[collector]() for collector in get_collectors(config.get('enabled_collectors'))]
RefactoringTool: Files that need to be modified:
RefactoringTool: /usr/bin/prometheus-openstack-exporter

If you prefer I could send this patch as is as a pull requests.

Add metrics for reponse times

One thing that folks need to know is how fast the OpenStack APIs respond. It would be nice, either using the /healthcheck url or collecting from one of the existing API calls, to know how long that request takes to return.

Document which are the exposed metrics

Hi, there is no way to discover which metrics are currently exposed except reading the code or run it on a live openstack install.

It may be useful to document each metric to help new users on trying to decide if this project can be useful for them.