GithubHelp home page GithubHelp logo

python-beaver / python-beaver Goto Github PK

View Code? Open in Web Editor NEW
554.0 37.0 172.0 1.25 MB

Needs Maintainer: python daemon that munches on logs and sends their contents to logstash

Home Page: https://python-beaver.readthedocs.org/

License: MIT License

Python 94.04% Shell 5.96%

python-beaver's Introduction

Beaver

image

image

python daemon that munches on logs and sends their contents to logstash

Requirements

  • Python 2.6+
  • Optional zeromq support: install libzmq (brew install zmq or apt-get install libzmq-dev) and pyzmq (pip install pyzmq==2.1.11)

Installation

Using PIP:

From Github:

pip install git+git://github.com/python-beaver/[email protected]#egg=beaver

From PyPI:

pip install beaver==36.3.1

Documentation

Full documentation is available online at http://python-beaver.readthedocs.org/

You can also build the docs locally:

# get sphinx installed
pip install sphinx

# retrieve the repository
git clone git://github.com/python-beaver/beaver.git

# build the html output
cd beaver/docs
make html

HTML docs will be available in beaver/docs/_build/html.

Contributing

When contributing to Beaver, please review the full guidelines here: https://github.com/python-beaver/python-beaver/blob/master/CONTRIBUTING.md. If you would like, you can open an issue to let others know about your work in progress. Documentation must be included and tests must pass on Python 2.6 and 2.7 for pull requests to be accepted.

Credits

Based on work from Giampaolo and Lusis:

Real time log files watcher supporting log rotation.

Original Author: Giampaolo Rodola' <g.rodola [AT] gmail [DOT] com>
http://code.activestate.com/recipes/577968-log-watcher-tail-f-log/

License: MIT

Other hacks (ZMQ, JSON, optparse, ...): lusis

python-beaver's People

Contributors

alappe avatar amfranz avatar andrewgross avatar anentropic avatar astral303 avatar chrisroberts avatar cova-fe avatar dependabot-preview[bot] avatar doismellburning avatar eyj avatar faulkner avatar gregsterin avatar grncdr avatar hectcastro avatar hltbra avatar jamiecressey avatar jonathanq avatar josegonzalez avatar kitchen avatar librato-peter avatar mcasado avatar michaeldauria avatar normanjoyner avatar pchandra avatar pierref avatar rafaelmagu avatar ronnocol avatar timstoop avatar yankeeicecream avatar zuazo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-beaver's Issues

Fix issues when running under supervisor

When running with supervisor, it appears that logging is blocked at program start, so long as the connection to redis succeeds. This might have to do with environment setting, but it is weird that it works fine outside of supervisor and not when in.

The connection issue also presumably occurs with other brokers as well.

Beaver not re-connecting after a transport exception

I see this behavior happening all the time when I converge my Chef nodes. The Chef run causes RabbitMQ to restart, and beaver loses its connection to it. This causes a transport exception that, in theory, should be sorted moments after with a respawn.

However, the respawn of the transport does not seem to work, as every respawn causes another exception until max_tries is reached.

The only solution is to restart beaver completely.

Example log (restarted RabbitMQ at 00:39:59):

[2013-01-30 06:39:34,635] INFO    [801g40282] - watching logfile /var/log/syslog
[2013-01-30 06:39:34,635] INFO    Working...
[2013-01-30 06:39:34,635] INFO    Starting queue consumer
ERROR:pika.adapters.base_connection:Socket Error on fd 23: 104
WARNING:pika.adapters.blocking_connection:Received Channel.Close, closing: None
[2013-01-31 00:39:59,302] INFO    Caught transport exception, respawning in 3 seconds
[2013-01-31 00:40:02,305] INFO    Caught transport exception, respawning in 9 seconds
[2013-01-31 00:40:11,308] INFO    Caught transport exception, respawning in 27 seconds

Add an event parser

This event parser may also be configured to leave out certain keys as specified in the config file.

Exception: Unsupported UTF-8 sequence length

I'm not exactly sure what caused this but I think it's might be because my log path contained binary files (apache ssl_cache). The problem is that beaver just crashes when this happens. Changing path not to include binary files is the obvious solution in my case. But, it's possible that real log files contain bad data. It would be nicer if beaver just ignored the bad data and continued on.

[2012-09-11 23:38:12] [fd00g4e4d3] - watching logfile /var/log/httpd/access-ssl.log
[2012-09-11 23:38:12] [fd00g4e501] - watching logfile /var/log/httpd/access.log
[2012-09-11 23:38:12] [fd00g4e4d2] - watching logfile /var/log/httpd/error-ssl.log
[2012-09-11 23:38:12] [fd00g4e4b3] - watching logfile /var/log/httpd/error.log
[2012-09-11 23:38:12] [fd00g4e521] - watching logfile /var/log/httpd/httpd.pid
[2012-09-11 23:38:12] [fd00g4e51f] - watching logfile /var/log/httpd/ssl_scache(512000).dir
[2012-09-11 23:38:12] [fd00g4e520] - watching logfile /var/log/httpd/ssl_scache(512000).pag
[2012-09-11 23:38:12] Working...
[2012-09-11 23:50:36] Unhandled Exception: Unsupported UTF-8 sequence length when encoding string

beaver crashes if redis not available

While doing some maintenance on redis this morning, it looks like beaver crashed because it was not able to connect to it. Is there a way to make it more graceful with such a situation?

Thanks!

ConnectionError: Error 111 connecting 10.x.x.x:6379. Connection refused.
[2013-01-09 10:00:06,148] INFO Starting queue consumer
Process Process-22:
Traceback (most recent call last):
File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
self._target(_self._args, *_self._kwargs)
File "/usr/lib/python2.6/site-packages/Beaver-21-py2.6.egg/beaver/queue.py", line 18, in run_queue
transport.callback(*data)
File "/usr/lib/python2.6/site-packages/Beaver-21-py2.6.egg/beaver/redis_transport.py", line 46, in callback
self._pipeline.execute()
File "/usr/lib/python2.6/site-packages/redis-2.4.11-py2.6.egg/redis/client.py", line 1528, in execute
return execute(conn, stack)
File "/usr/lib/python2.6/site-packages/redis-2.4.11-py2.6.egg/redis/client.py", line 1485, in _execute_pipeline
connection.send_packed_command(all_cmds)
File "/usr/lib/python2.6/site-packages/redis-2.4.11-py2.6.egg/redis/connection.py", line 241, in send_packed_command
self.connect()
File "/usr/lib/python2.6/site-packages/redis-2.4.11-py2.6.egg/redis/connection.py", line 189, in connect
raise ConnectionError(self._error_message(e))

How to use add_field in configuration file

Hi,
I want to add some fields to event before beaver send it to redis server. I added 'add_field' to config.ini file but values are not forwarded to stdout. In examples there are only 'tags' and 'type' meta data words. Is add_fiels working?

p.s. beaver is a grat tool, and i use it on production env. Sorry about my english :)

Can ujson >= 1.19 be used?

I'd like to use 1.23 since there's an rpm built for it so I don't have to install compilers on all my machines to use this.

Do you know of any specific problems?

beaver events missing tags after some time

Noticed an issue with beaver last night when I started using it with a system that produces a good amount of logs. After some time of running, beaver stopped tagging the event stream it was sending to my redis queue. The events were still coming in, but the tags were missing. Restarting beaver corrected it. Not sure how to reproduce the issue or what sort of debug info would be helpful.

So far, about 12 hours after restarting to correct above issue, the problem has not occurred again.

Files specified with -p PATH or BEAVER_PATH are not checked

The directory specified with -p PATH or BEAVER_PATH are not checked.
The default directory of /var/log is similarly not checked.
Files specified with -f are checked, including those with globs.

touch /tmp/beaver-1.log

Window 1:
beaver -p /tmp/
Window 2:
date >> /tmp/beaver-1.log

No output. Output should be a JSON line with date.

Window 1:
BEAVER_PATH="/tmp" beaver
Window 2:
date >> /tmp/beaver-1.log

No output. Output should be a JSON line with date.

Window 1:
beaver
Window 2:
sudo date >> /var/log/beaver-1.log

No output. Output should be a JSON line with date.

Window 1:
beaver -f /tmp/beaver-1.log
Window 2:
date >> /tmp/beaver-1.log

Entry appears in "Window 1", as expected.

Window 1:
beaver -f /tmp/*.log
Window 2:
date >> /tmp/beaver-1.log

Entry appears in "Window 1", as expected.

current trunk outputs no data for stdout transport

When the transport is stdout there is zero data output.

This is because the utils.log() helper function is calling logging.log() but the default log instance does not define a stream handler, so the data is dropped on the floor.

signal to force beaver to reconnect to redis

In my setup I'm thinking about having a number of redis instances behind a load balancer. This is for ease of maintenance and redundancy and such.

However, since beaver's redis connections are sticky, should I take one of them down and put it back in service, it will never get any more connections.

I think being able to send a HUP or something to beaver to make it reconnect to redis would be a great option. This would allow me to then tell my beavers to reconnect and the load balancer should take care of spreading things back out again.

Deprecate all environment variables

It is getting a bit messy to check both environment variables, configuration files, and parsed arguments. We should just support two things:

  • Argparse - It is good at what it does
  • Conf files - For specialization of files

I'll start putting in a deprecation notice in the next release, removing all references to env vars from the readme, and remove them completely by release 20.

amqp transport and RabbitMQ

Hi! Well done, I'd love to have a much more lightweight logstash shipper on our boxes! Anyway, is the amqp transport tailored to ZeroMQ or will it ship to a RabbitMQ queue as well?

More fields in @fields

Hello,
After last patch i can add field in config.ini, and it's forwarded in outut message. The problem is when i want to add more then one field.

In config.ini I wrote:

add_field: Env,RG,App,Yeti

In output message beaver send:

"@fields":{"Env":["RG","Yeti"]}

Logging from beaver

Hi,

At the moment logging from beaver it self standard goes to stdout.
When daemonized it goes nowhere unless you specify the '-o OUTPUT, --output OUTPUT'.
The naming of this makes no sense to what it does.

Can this be renamed to --logfile / -l ?

string and json logfiles

Hi,

Its very unclear at this moment if beaver support mixing 'plain' logs ( strings ) and json_event logs.
Should be able to set per file the format so its processed correctly.

Use setuptools instead of distutils for packaging

distutils gives warnings about how it doesn't know about install_requires.

Might make sense to use requires for distutils, but I don't know how that affects packaging. Any pythonistas that want to chime in would be very much appreciated.

Add support for start_position

From http://logstash.net/docs/1.1.3/inputs/file#setting_start_position

"Choose where logstash starts initially reading files - at the beginning or at the end. The default behavior treats files like live streams and thus starts at the end. If you have old data you want to import, set this to 'beginning'

This option only modifieds "first contact" situations where a file is new and not seen before. If a file has already been seen before, this option has no effect."

Value can be any of: "beginning", "end"
Default value is "end"

This is very handy if you want to process the whole file (quite a common case in our scenario). Of course it must be coupled with the sincedb support discussed in #6, otherwise we will have duplicates at any Beaver (re)start.

Beaver stops working after logrotate

Beaver==9 stops sending logs after log rotation. This is my logrotate configuration:

/var/log/celery/*.log
{
        compress
        copytruncate
        create 644 www-data www-data
        daily
        maxage 365
        maxsize 100M
        missingok
        nodelaycompress
        notifempty
        rotate 999
} 

And my beaver conf:

$ cat /etc/beaver.ini
[/var/log/celery/fc.*log]
type: celery
tags: celery,unstable
add_field: site,fc,server_type,unstable

$ REDIS_NAMESPACE='logstash' REDIS_URL="redis://...:6379/0" beaver -t redis -c /etc/beaver.ini

Unhandled Exception: 'NoneType' object has no attribute '__getitem__'

While tailing a logfile with beaver and writing to redis, the following exception occurs:

Unhandled Exception: 'NoneType' object has no attribute 'getitem'

Setup:

  • Fedora 17
  • redis-2.4.10-1.fc17.x86_64
  • python-redis-2.4.9-2.fc17.noarch
  • beaver from git
  • commandline: REDIS_URL="redis://127.0.0.1:6379/0" sudo python beaver -t redis -f /var/log/mcollective.log

The same happens on CentOS 6.3 and Ubuntu 12.04 with different redis versions. Any idea?

-f does nothing at all

Regardless of what is used for -f it will just fall back to scanning /var/log.

The only way I've found to correctly set the files to watch is by setting BEAVER_FILES.

Add a tcp transport

This should handle anyone who wants something that is "syslog" compatible, though with better options - such as redis - I only see this useful as a last resort.

use socket.getfqdn rather than socket.gethostname?

My machines are set up with a hierarchical fqdn and often have identical hostnames. For example: app1v-role.group.environment.location.example.com

Fortunately, socket.gethostname on these machines does return me the full fqdn. However, the FM says that this is not always the case and if you want to always get the fqdn to use the socket.getfqdn function instead.

I'd like to use getfqdn instead of gethostname. I need to be able to know which app1v-role I'm getting logs from and be able to rely on that being available going forward. I realize that it has potential to break backward compatibilty if it's just changed, since some people (my own local test bed included, for some reason) are only getting the hostname part and not the fqdn. Maybe this should be optional? With a command line flag or ini file setting?

Let me know your thoughts, and I'll make a pull request.

Or if you want, I can implement it both ways and you can just pick one!

Allow configuration of all arguments via config file

Allowing some things to be specified via config file but not connection info is stupid, and will also allow us to move forward with a threaded version where each file might have a slightly different configuration.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.