honeycombio / beeline-python Goto Github PK

Legacy instrumentation for your Python apps with Honeycomb.

License: Apache License 2.0

Python 99.56% Shell 0.44%

beeline-python's Introduction

Honeycomb Beeline for Python

⚠️Note: Beelines are Honeycomb's legacy instrumentation libraries. We embrace OpenTelemetry as the effective way to instrument applications. For any new observability efforts, we recommend instrumenting with OpenTelemetry.

This package makes it easy to instrument your Python web application to send useful events to Honeycomb, a service for debugging your software in production.

Compatible with

Currently, supports Django (>3.2), Flask, Bottle, and Tornado.

Compatible with Python >3.7.

Updating to 3.3.0

Version 3.3.0 added support for Environment & Services, which changes sending behavior based on API Key.

If you are using the FileTransmission method and setting a false API key - and still working in Classic mode - you must update the key to be 32 characters in length to keep the same behavior.

Contributions

Features, bug fixes and other changes to beeline-python are gladly accepted.

If you add a new test module, be sure and update beeline.test_suite to pick up the new tests.

All contributions will be released under the Apache License 2.0.

beeline-python's People

Contributors

Stargazers

Watchers

beeline-python's Issues

TypeError: new_traced_event() missing 2 required positional arguments: 'trace_id' and 'parent_id'

I've just tried honeycomb-beeline to instrument my Flask app following the instructions at https://docs.honeycomb.io/getting-data-in/beelines/beeline-python/ but I am getting this error:

127.0.0.1 - - [07/Sep/2018 09:30:38] "GET / HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/Users/afausti/Projects/squash-demo/squash-deployment/squash-restful-api/env/lib/python3.6/site-packages/flask/app.py", line 2309, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/afausti/Projects/squash-demo/squash-deployment/squash-restful-api/env/lib/python3.6/site-packages/beeline/middleware/flask/__init__.py", line 40, in __call__
    }, trace_name=trace_name, top_level=True)
  File "/Users/afausti/Projects/squash-demo/squash-deployment/squash-restful-api/env/lib/python3.6/site-packages/beeline/__init__.py", line 183, in _new_event
    ev = g_tracer.new_traced_event(trace_name)
TypeError: new_traced_event() missing 2 required positional arguments: 'trace_id' and 'parent_id'

Here is how I am invoking the beeline in my app:

import os
import beeline
from beeline.middleware.flask import HoneyMiddleware

from app import create_app, db

profile = os.environ.get('SQUASH_API_PROFILE', 'app.config.Development')
honey_api_key = os.environ.get('HONEY_API_KEY')

app = create_app(profile)

beeline.init(writekey=honey_api_key, dataset="squash-rest-api", service_name="squash")

HoneyMiddleware(app, db_events=True)

I see the same with

HoneyMiddleware(app, db_events= False)

The versions I am running:

Flask             1.0.2
Flask-RESTful     0.3.6
Flask-SQLAlchemy  2.3.2
honeycomb-beeline 1.2.0
libhoney          1.5.0

Do we really wanna hide all exceptions, i think this is breaking some code of mine

beeline-python/beeline/patch/requests.py

Line 18 in cf6272b

pass

Add tests for Starlette middleware

Add tests for Starlette middleware was added in #109.

Trace fields and the "app." prefix

The Python beeline seems to go to great lengths to prefix trace fields with app., but not context fields. This is proving problematic for my distributed tracing, where my trace-level fields from a Ruby service named like foo.bar become app.foo.bar on the Python side. Granted, I had to kind of hack around the Ruby beeline to even get trace field without the app. prefix, but the same doesn't seem particularly doable in the Python beeline.

This auto-prefixing already bit once with the app.app. doubling up in #96. I personally think the whole idea is in need of revisiting across beelines. Why can't the beeline just "do what I say"? The Python beeline already does this for context fields. The app. prefix is also not particularly useful to me for distributed tracing; more sensible would be use my different service names as the primary "namespace". Which I could still do with app.service_name.*, but the app. seems redundant to me.

Those philosophical considerations aside, removing the auto-prefixing of trace fields in the Python beeline would require a major version bump. Perhaps it'd be safer to treat unmarshalled traces specially, avoiding the auto-prefixing? That'd keep the trace-level field names consistent across the distributed trace, at least, which would make for more predictable querying.

Inconsistent field name for User-Agent header across Beelines

The Python Beeline adds a field for the User-Agent to the http_server spans it creates:

beeline-python/beeline/middleware/flask/__init__.py

Line 87 in ec05745

"request.user_agent": environ.get('HTTP_USER_AGENT'),

The field name request.user_agent is inconsistent with the other Beelines which use request.header.user_agent:

It would be great if these could be consistent to ease querying in deployments with polyglot services.

Capture exception details for lambda middleware

Currently if an exception is thrown during lambda execution it does not end up in honeycomb unless it is explicitly handled/recorded.

It would be nice of the lambda wrapper had a top level error handler that catches, logs, and rethrows.

Accept and issue OpenTelemetry/W3C style trace context (traceparent header)

G'day! We're propagating trace context around our environment using W3C Trace Context aka. traceparent following the OpenCensus convention and OpenTelemetry specification:

Traceparent: 00-D2E1605E07E0EEA1EDBCB72CA3DDEC23-50FEF9796F94C8B8-01

That doesn't fit so well with the X-Honeycomb-Trace header key hard-coded through this code base, preventing us using the beeline to help us send events to Honeycomb from ALL THE THINGS.

Strikes me the headers could be configurable, perhaps in such a way suppose the beelines could accept and issue both if necessary. You up for a PR for that?

Root span never closed if status code is 500

I'm not entirely sure for the reason of this bit of code (https://github.com/honeycombio/beeline-python/blob/main/beeline/middleware/flask/__init__.py#L48-L52), but if I catch an exception in my code (logging with sentry), then if I return a 500 status code the root trace is never closed, and it shows up as missing the root trace in the Honeycomb trace view.

I temporarily worked around this by patching the honeycomb middleware:

class PatchedHoneyWSGIMiddleware(object):
    def __init__(self, app):
        self.app = app

    def __call__(self, environ, start_response):
        req = Request(environ, shallow=True)
        wr = WSGIRequest("flask", environ)

        root_span = beeline.propagate_and_start_trace(wr.request_context(), wr)

        def _start_response(status, headers, *args):
            status_code = int(status[0:4])
            beeline.add_context_field("response.status_code", status_code)
            beeline.finish_trace(root_span)

            return start_response(status, headers, *args)

        return self.app(environ, _start_response)


beeline.middleware.flask.HoneyWSGIMiddleware = PatchedHoneyWSGIMiddleware

making sure to always close the trace root, but this is probably missing something.

Allow easier tweaking of context in Django middleware

As seen in

beeline-python/beeline/middleware/django/__init__.py

Line 65 in 9c20f68

trace = beeline.start_trace(context={

, the data in the event context is hardcoded.

I think it would be very beneficial to have some way to override the data in the context without having to re-define create_http_event completely.

If the contents of the context= parameter could be moved to a separate function, it would be easier to override without also having to re-implement the start_trace...finish_trace.

Thread AttributeError: 'HoneyDBMiddleware' object has no attribute 'query_start_time'

Tested with vbeeline versions (error happens in both):

honeycomb-beeline==2.14.0
honeycomb-beeline==2.16.2

When multiple database sessions are created (e.g. via python Threading ) and a new app is created with a SQLAlchemy session, the following is observed:

Traceback (most recent call last):
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1281, in _execute_context
    self.dispatch.after_cursor_execute(
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
    fn(*args, **kw)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/beeline/middleware/flask/__init__.py", line 113, in after_cursor_execute
    query_duration = datetime.datetime.now() - self.query_start_time
AttributeError: 'HoneyDBMiddleware' object has no attribute 'query_start_time'

Steps to reproduce

An flask app which creates a sqlalchemy thread , ran with

flask run --with-threads

The calling code, which is using Thread and creating a new SqlAlchemy Session (due to create_app factory function): https://github.com/Subscribie/subscribie/blob/ccda2c81f432b3104f58c1226fd31e50d3319fee/subscribie/email.py#L23,L74

A full traceback is below:

Traceback (most recent call last):
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1281, in _execute_context
    self.dispatch.after_cursor_execute(
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
    fn(*args, **kw)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/beeline/middleware/flask/__init__.py", line 113, in after_cursor_execute
    query_duration = datetime.datetime.now() - self.query_start_time
AttributeError: 'HoneyDBMiddleware' object has no attribute 'query_start_time'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask/app.py", line 2464, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/beeline/middleware/flask/__init__.py", line 56, in __call__
    return self.app(environ, _start_response)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask/app.py", line 2450, in wsgi_app
    response = self.handle_exception(e)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask/app.py", line 1867, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/chris/Documents/programming/python/app/app/blueprints/checkout/__init__.py", line 192, in thankyou
    send_welcome_email()
  File "/home/chris/Documents/programming/python/app/app/email.py", line 79, in send_welcome_email
    return render_template("thankyou.html")
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask/templating.py", line 136, in render_template
    ctx.app.update_template_context(context)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/flask/app.py", line 838, in update_template_context
    context.update(func())
  File "/home/chris/Documents/programming/python/app/app/views.py", line 60, in inject_template_globals
    company = Company.query.first()
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3429, in first
    ret = list(self[0:1])
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3203, in __getitem__
    return list(res)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
    return self._execute_and_instances(context)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1508, in _handle_dbapi_exception
    util.raise_(newraise, with_traceback=exc_info[2], from_=e)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1281, in _execute_context
    self.dispatch.after_cursor_execute(
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
    fn(*args, **kw)
  File "/home/chris/Documents/programming/python/app/venv/lib/python3.8/site-packages/beeline/middleware/flask/__init__.py", line 113, in after_cursor_execute
    query_duration = datetime.datetime.now() - self.query_start_time
AttributeError: '_thread._local' object has no attribute 'span'

Add span decorator

It would be nice to just decorate a function and wrap it with a span. We have something similar in the lambda middleware that we could just generalize:

https://github.com/honeycombio/beeline-python/blob/master/beeline/middleware/awslambda/__init__.py#L29

Include request.data in the trace context

In HoneyMiddlewareBase.create_http_event I can see that request.POST.dict() is being called. However, it would be much more useful to me to have request.data. As outlined in the Django API guide, request.POST does not have all the information we need in it. See here

Is this something we can consider adding to the trace context?

[flask-sqlalchemy] Generated db.query seems to include source code file + line

This should probably be its own field.

HoneyDBMiddleware crashes on dict parameters with complex types

HoneyDBMiddleware will crash if you use pass a complex type like a list as a query parameter.

Example:

body = "SELECT * FROM table WHERE id IN :item_ids"
query = text(body).params(
    item_ids=[1,2,3],
)

Results in:

TypeError: not all arguments converted during string formatting

Because the format string is expecting a single argument:

beeline-python/beeline/middleware/flask/__init__.py

Line 97 in e56680e

param += "%s" % v

I guess the intention is just to stringify the parameter, in which case the following might do:

param += str(v)

Attach query params to “requests” patch

This may not be desirable behaviour for all integrations, but I’d love for the query parameters to be added to the event emitted when using the requests library.

I wonder what people’s thoughts on this functionality are?

Trace gets broken if only `trace_id` and `parent_id` are present in honeycomb headers

The current version of the Python Beeline expects the Honeycomb tracing headers to contain at least three keys.

beeline-python/beeline/trace.py

Line 339 in 92b972c

if len(kv_pairs) >= 3:

Since dataset and context are optional there are times when there are only two keys in the header 1;trace_id=xxx,parent_id=xxx. This makes the Python Beeline start a new trace and not associate itself with the current trace due to the code to extract the current trace_id not being executed

How do I turn off honeycomb during the circleci run of my django app?

So far, I have been able to turn off honeycomb in my django in circleci using this way

import beeline
from django.apps import AppConfig
from django.conf import settings


class CoreConfig(AppConfig):
    name = "core"

    def ready(self):
        if settings.HONEYCOMB_ON:
            beeline.init(
                writekey=settings.HONEYCOMB_API_KEY,
                dataset=settings.HONEYCOMB_DATASET,
                service_name=settings.HONEYCOMB_SERVICE_NAME,
                debug=settings.DEBUG,
            )

So in circleci, settings.HONEYCOMB_ON is false and it works until 2.12

but with the new 2.13 version this no longer works for me.

in my circleci, I get the following

Traceback (most recent call last):
  File "/home/circleci/repo/config/tests/test_redirect.py", line 8, in test_redirect
    response = self.client.get("/")
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/django/test/client.py", line 535, in get
    response = super().get(path, data=data, secure=secure, **extra)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/django/test/client.py", line 347, in get
    **extra,
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/django/test/client.py", line 422, in generic
    return self.request(**r)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/django/test/client.py", line 503, in request
    raise exc_value
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/beeline/middleware/django/__init__.py", line 144, in __call__
    response = self.create_http_event(request)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/beeline/middleware/django/__init__.py", line 104, in create_http_event
    dr = DjangoRequest(request)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/beeline/middleware/django/__init__.py", line 12, in __init__
    beeline.get_beeline().log(request.META)
AttributeError: 'NoneType' object has no attribute 'log'

----------------------------------------------------------------------

how do i turn off honeycomb during the circleci run of the my django app?

Flask `request.route` returns endpoint rather than "route".

I'm updating the beeline in one of our applications and found this in the changelog. And found request.route would clash with some instrumentation we already have. We've used request.url_rule from flask as the field value for request.route. For anyone not familiar this returns something like /user/<username>, rather than endpoint which returns the function name.

This is the line I'm referring to:

beeline-python/beeline/middleware/flask/__init__.py

Line 39 in 0ca9620

beeline.add_field("request.route", flask.request.endpoint)

I'm happy to PR a fix which would implement the following:

beeline.add_field("request.endpoint", flask.request.endpoint)
beeline.add_field("request.route", flask.request.url_rule)

Other beeline's instrument the "pattern" of a request rather than the function name, some examples:
Go: https://github.com/honeycombio/beeline-go/blob/ca594899bf23a4e2496df8fd214bd7e0455ffb0f/wrappers/hnyecho/echo_test.go#L60
NodeJS/Express: (Returns: /users/:user_id) https://github.com/honeycombio/beeline-nodejs/blob/7e9ee85b5db91dce469770599019e78730ff12d0/lib/instrumentation/express.js#L143
Ruby: https://www.honeycomb.io/blog/honeybyte-beeline-dev-molly-struve/

AWS botocore.vendored.requests does not get patched with beeline.patch

AWS boto3 uses botocore and botocore seems to bring its own requests @ botocore.vendored.requests, which does not get patched.

I've managed to get something working on my end by copying the patch to patch botocore.vendored.requests as well.

can this MAYBE happen within beeline-python? or is that maybe too much to ask? or is there another recommendation?

Are there any special instructions for initialising beeline when using waitress server?

Waitress is an increasingly popular application server in cloud contexts (google cloud run, for example) as it buffers incoming requests and can be used without being frontend by Nginx or some other buffering server.

https://docs.honeycomb.io/getting-data-in/python/beeline/#using-the-python-beeline-with-python-pre-fork-models mentions a few different ways to initialise the library in a pre-fork environment.

I don't believe waitress uses a forking model. My understanding (basic) is that it uses asyncio to buffer individual requests, and a threadpool to execute the application.

Does the beeline need to be initialised in a special way with Waitress?

Tracer context manager should return reference to trace

Example use:

with beeline.tracer("name") as trace:
   start_thread_and_pass_trace(trace)

Beeline prevents sentry.io from reporting

Issue created in the sentry repository @ getsentry/sentry-python#442

Details:
I'm activating sentry in the top of settings.py (docs)

sentry_sdk.init(
    dsn=dsn,
    environment=env,
    integrations=[DjangoIntegration()],
)

beeline is being initialized in the gunicorn config file (docs)

def post_worker_init(worker):
    beeline.init(
        writekey=honeycomb_key,
        dataset=dataset,
        service_name=service
    )

I can manually send errors to sentry, but the automatic error reporting doesn't work unless honeycomb is removed.

Partially redacted requirements.txt

Django~=2.1.9
djangorestframework~=3.9.1
gunicorn~=19.9.0
sentry-sdk~=0.10.2
honeycomb-beeline~=2.6.1
libhoney~=1.8.0
statsd~=3.3.0

[flask-sqlalchemy] db.query_args appears as the param names, not param values

The problem seems to be that the parameters argument to the before_cursor_execute listener is almost always (maybe always) a dict in my project. Beeline always treats it as if it's a list: https://github.com/honeycombio/beeline-python/blob/634a567/beeline/middleware/flask/__init__.py#L115.

Quote from the docs about the parameters argument:

Dictionary, tuple, or list of parameters being passed to the execute() or executemany() method of the DBAPI cursor. In some cases may be None.

Uninitialised tracer raises when not being initialised

I am looking here for either guidance or a fix.

I have a Flask app that runs in different environments, one of them being jupyter notebooks. As you could imagine, Jupyter notebook loads part of the app, crashing when tracer gets called, if uninitialised. I'd like to provide an experience where a developer does not have to load honeycomb setup in Jupyter to experiment with the code.

Here is the little helper package I wrote for myself to introduce a nice decorator for functions I want to trace::

import beeline
import functools
import config
import os

from beeline.middleware.flask import HoneyMiddleware

tracer = beeline.tracer
add_field = beeline.add_field
add_trace_field = beeline.add_trace_field


def init(app):
    beeline.init(writekey=config.HONEYCOMB_API_KEY, dataset='<redacted>', service_name='<redacted>', presend_hook=presend)
    HoneyMiddleware(app, db_events=False)


def presend(fields):
    fields['pid'] = os.getpid()


def traced(name):
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            with tracer(name):
                return func(*args, **kwargs)
        return wrapper
    return decorator

Exception

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-edff33869504> in <module>()
      1 import model
----> 2 model.load_bot_model(6)

~/SageMaker/team-ml/sth/src/honeycomb.py in wrapper(*args, **kwargs)
     24         @functools.wraps(func)
     25         def wrapper(*args, **kwargs):
---> 26             with tracer(name):
     27                 return func(*args, **kwargs)
     28         return wrapper

~/anaconda3/lib/python3.6/site-packages/beeline/__init__.py in tracer(name, trace_id, parent_id)
    361     - `name`: a descriptive name for the this trace span, i.e. "database query for user"
    362     '''
--> 363     return _GBL.tracer(name=name, trace_id=trace_id, parent_id=parent_id)
    364 
    365 def start_trace(context=None, trace_id=None, parent_span_id=None):

AttributeError: 'NoneType' object has no attribute 'tracer'

I can see two different solutions here:

changing beeline.tracer to not fail when it's not initialised
set the tracer inside init to the beeline.tracer, otherwise have just a fake one.

What are your thoughts?

Add "request.route" as attribute for Django middleware.

Double "app." prefix in distributed trace fields

The type checking done by

beeline-python/beeline/trace.py

Lines 211 to 214 in 43a3f9b

 if (type(name) == str and not name.startswith("app.")) or type(name) != str: 

 key = "app.%s" % name 

 else: 

 key = name

was added by 70ad2e9 as an addendum to #96. But it fails to account for unicode types, which are returned by json.loads (at least in Python 2.7.12):

>>> json.loads('{"app.field":"value"}')
{u'app.field': u'value'}

Thus, when unmarshal_trace_context parses a distributed trace header's fields with

beeline-python/beeline/trace.py

Line 352 in 43a3f9b

context = json.loads(base64.b64decode(v.encode()).decode())

and passes them into add_trace_field with the likes of

beeline-python/beeline/middleware/flask/__init__.py

Lines 60 to 63 in 43a3f9b

 # populate any propagated custom context 

 if isinstance(context, dict): 

 for k, v in context.items(): 

 beeline.add_trace_field(k, v)

then type(u'app.field') == unicode, not str, so it still gets coerced to 'app.app.field'.

Document how to use requests

I’m not sure how to get traces to show up across processes running beeline-python Rpcing to each other using libraries like requests or urllib. Help me understand is it just supposed to work? Am I supposed to provide the same name across traces when I can with? I’m not sure how to trace micro flask services calling other micro flask services

Beeline middleware causes some specific requests to skip parsers & potentially other middleware

One of the endpoints we implemented on our Django rest framework application has POST method with multipart/form-data content type. We also utilize a parser which converts camelCase fields in incoming requests to snake_case before they hit the views.

That one endpoint started failing right after integrating Beeline. Actually, it still receives the request, but when you look into the request, you see that it didn't get modified by the parser. Basically, that specific request skips the parsers and potentially other middleware after Beeline middleware.

I've spent some time to investigate what was going on and realized that the stream on the request object was exhausted by the line on the below:

request.POST.dict()

I was able to find some more reference when I went deeper:

The comment in the source code: https://github.com/encode/django-rest-framework/blob/0cc09f0c0dbe4a6552b1a5bbaa4f7f921270698a/rest_framework/request.py#L326
A warning (the green part under process_view on the page) on the documentation (this is from Django package for process_view, but it still applies): https://docs.djangoproject.com/en/2.2/topics/http/middleware/#process-view

ASGI support

Hi,

I'm using starlette and uvicorn for my python webservice. I've put together a middleware for starlette based on the WSGI one I saw in this codebase. Worth saying that I have no idea what I'm doing, but I think the code below is what is required.

import beeline

from starlette.datastructures import URL, Headers
from starlette.responses import RedirectResponse
from starlette.types import ASGIApp, Receive, Scope, Send


class HoneycombMiddleware:
    def __init__(self, app: ASGIApp) -> None:
        self.app = app

    async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:

        trace = beeline.start_trace(
            context=self.get_context_from_environ(scope))

        def send_wrapper(response):
            beeline.add_context_field(
                "response.status_code", response.get("status"))
            beeline.finish_trace(trace)
            return send(response)

        await self.app(scope, receive, send_wrapper)

    def get_context_from_environ(self, scope):
        request_method = scope.get('method')
        if request_method:
            trace_name = "starlette_http_%s" % request_method.lower()
        else:
            trace_name = "starlette_http"

        headers = Headers(scope=scope)

        return {
            "name": trace_name,
            "type": "http_server",
            "request.host": headers.get('host'),
            "request.method": request_method,
            "request.path": scope.get('path'),
            "request.content_length": int(headers.get('content-length', 0)),
            "request.user_agent": headers.get('user-agent'),
            "request.scheme": scope.get('scheme'),
            "request.query": scope.get('query_string').decode("ascii")
        }

Exception raised while shutting down pytest + Flask

I am working on flask application. Intermittently, after tests succeed (run with pytest), I get following error:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/Users/mario/src/project/appenv/lib/python3.7/site-packages/libhoney/transmission.py", line 113, in _sender
    ev = self.pending.get(timeout=self.send_frequency)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/queue.py", line 178, in get
    raise Empty
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/mario/src/project/appenv/lib/python3.7/site-packages/libhoney/transmission.py", line 126, in _sender
    pool.submit(self._flush, events)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py", line 151, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown

beeline is initialised with an empty string as key in test env.

This is far too many events for us

We often do large cursor and wonder whether or not it would make more sense to fire this on execute() rather than the low level cursor. Tracking the amount of the sub execute might be nice, but the span should track the entire execution? I wonder if there's metadata that we could accumulate as well.

beeline-python/beeline/middleware/flask/__init__.py

Line 102 in a2e01dc

listen(Engine, 'before_cursor_execute', self.before_cursor_execute)

beeline-python/beeline/middleware/flask/__init__.py

Line 103 in a2e01dc

listen(Engine, 'after_cursor_execute', self.after_cursor_execute)

HoneyDBMiddleware gets db durations wrong with multiple threads

The query start time is being stashed on the shared object, rather than the thread-local storage:

beeline-python/beeline/middleware/flask/__init__.py

Line 107 in 4bbbb9a

self.query_start_time = datetime.datetime.now()

As a consequence, db duration is incorrectly calculated when multiple concurrent threads are running.

Here's a simple repro:

import os
import time

import beeline
from beeline.middleware.flask import HoneyMiddleware
from flask import Flask
from flask_sqlalchemy import SQLAlchemy

beeline.init(
    writekey=os.environ.get("HONEYCOMB_API_KEY"),
    dataset="concurrency-test",
    service_name="busted",
    debug=True,
)

# Pass your Flask app to HoneyMiddleware
app = Flask(__name__)
app.config[
    "SQLALCHEMY_DATABASE_URI"
] = "postgresql://postgres:password@localhost/honeycomb_test"
db = SQLAlchemy(app)

HoneyMiddleware(
    app, db_events=True
)  # db_events defaults to True, set to False if not using our db middleware with Flask-SQLAlchemy


@app.route("/sleep/<seconds>")
def sleepy(seconds):
    time.sleep(1)
    db.session.execute(f"SELECT pg_sleep({seconds})")
    time.sleep(1)
    return f"yawn! (slept {int(seconds) + 2} seconds)"

Call it with this script

curl http://localhost:5000/sleep/5 &
sleep 4
curl http://localhost:5000/sleep/5

The second call generates a database span that has a five-second duration, but the db.duration is reported as ~1s.

Django 3.1 deprecated request.is_ajax()

Hi there,

Running the tests on one of my projects with the beeline enabled resulted in this little gem:
[...]/venv/lib/python3.8/site-packages/beeline/middleware/django/__init__.py:100: RemovedInDjango40Warning: request.is_ajax() is deprecated. See Django 3.1 release notes for more details about this deprecation. "request.xhr": request.is_ajax(),

Citing from the release notes:

"The HttpRequest.is_ajax() method is deprecated as it relied on a jQuery-specific way of signifying AJAX calls, while current usage tends to use the JavaScript Fetch API. Depending on your use case, you can either write your own AJAX detection method, or use the new HttpRequest.accepts() method if your code depends on the client Accept HTTP header."

It will be removed with Django 4.0, so there is plenty of time to make the necessary changes. I just wanted to put in on your radar.
I am not actively using this field myself and am only starting out with Django. So I didn't prepare a pull request, because I've no clue what's the expected behaviour.

Add support for the `dataset` field in the trace propagation header

The Go beeline supports adding a dataset field to the trace propagation header (x-honeycomb-trace). The effect of including this field is that spans from the downstream service should be sent to the indicated dataset. The rest of the beelines should implement this addition.

Allow not logging parameters for DB requests

In the Django beeline, we see that db.query_args is always added to the context when making a query.

https://github.com/honeycombio/beeline-python/blob/master/beeline/middleware/django/__init__.py#L28

It would be nice if this would be somewhat overrideable, similarly to how it's now possible to do so for the Middleware, since https://github.com/honeycombio/beeline-python/pull/73/files.

They way I think this would be best achievable would be to:

Add a class attribute to HoneyMiddleware which specifies which class the db_wrapper has.
Make the dictionary for the first beeline.add_context the result of another callable on the HoneyDBWrapper class.

Alternatively, this could of course also be handled by a presend hook.

AttributeError: 'pyodbc.Cursor' object has no attribute 'lastrowid' when using mssql+pyodbc

I'm instrumenting one of our applications which uses SQLalchemy but I found when I had db_events=True I would get the error AttributeError: 'pyodbc.Cursor' object has no attribute 'lastrowid'. From my research I found that this property isn't part of the DB API spec 1, nor implemented in pyodbc (the driver we're using).

The offending line i found was:

beeline-python/beeline/middleware/flask/__init__.py

Line 151 in 4512fc1

"db.last_insert_id": cursor.lastrowid,

Commenting it out I don't get the AttributeError.

My proposed fix would be something along these lines:

-             "db.last_insert_id": cursor.lastrowid,
+             "db.last_insert_id": getattr(cursor, 'lastrowid', None),

I can submit a PR for this if this fix works for you

Pyramid middleware?

I'd like to use Honeycomb with Pyramid (https://trypyramid.com/)
I gave it a try like this, but I'm not seeing data show up at Honeycomb:

if __name__ == "__main__":
    config = Configurator()
    config.add_route(ENDPOINT, "/{}".format(ENDPOINT))
    config.scan()

    app = config.make_wsgi_app()
    wrapped_app = HoneyWSGIMiddleware(app)
    server = make_server("0.0.0.0", 5000, wrapped_app)
    server.serve_forever()

Has anyone else done this already?

docs refer to parent_span_id instead of parent_id

Specifically, this page: https://docs.honeycomb.io/beeline/python/#using-traces

The name of the argument passed to the context manager is parent_id and not parent_span_id.

Consider a higher resolution timer

For python 2.7 we switched

beeline-python/beeline/trace.py

Line 294 in 23ebbd1

self.event.start_time = datetime.datetime.now()

        self.event.start_time = time.clock()

and

beeline-python/beeline/trace.py

Lines 137 to 138 in 23ebbd1

 duration = datetime.datetime.now() - span.event.start_time 

 duration_ms = duration.total_seconds() * 1000.0

            duration_ms = (time.clock() - span.event.start_time) * 1000

in order to get sub millisecond timings.

I'd PR this simple change, except:

Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> time.clock()
__main__:1: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead
3.8311687

I checked https://github.com/benjaminp/six but there are no entries around the time module.

honeycomb seems to be lagging after the request is done

We seem to be experiencing lag in the after request. Honeycomb currently uses the after_request . This is not a reasonable solution https://stackoverflow.com/questions/48994440/execute-a-function-after-flask-returns-response. We use gunicorn, perhaps we can move it after the request to your backend there. Also I noticed there's no logging at all if we can't connect to the honeycomb backend.

[flask] Beeline trying to get dict fields which do not exist

from pollinators:

spawned uWSGI http 1 (pid: 78)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1997, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python2.7/site-packages/beeline/middleware/flask/__init__.py", line 37, in __call__
    "request.user_agent": environ['HTTP_USER_AGENT'],
KeyError: 'HTTP_USER_AGENT'

Outgoing request instrumentation doesn't need to import *

The beeline docs show this line for instrumenting outgoing requests:

from beeline.patch.requests import *

This isn't necessary and caused problems for us in an app that had a variable which got shadowed by this import.

It's sufficient and safer to run simply:

from beeline.patch.requests import requests

Add field to current span and child spans only

Hi there 👋

I was wondering if there’s a way to add a field to the current span, and its child spans, but not to its parent span, as currently happens with add_trace_field?

I know that I could manually add the fields to each span, but I’d love to not have to do that, if possible.

My use case is iterating over multiple “accounts”, where the full iteration has a span, and each individual iteration has another span (and child spans), which attach the “account_id”. Using add_trace_field for the account_id mainly works as I’d like, expect the parent span also has the final account_id attached to it, which is undesirable.

How to disable request header logging?

Currently, I'm using beeline-python with Django and so far it's working great. However, despite disabling Django logging and setting beeline.init(debug=False), beeline is spitting out way too many log statements.

I'm using beeline like this,

import os

import beeline
from django.apps import AppConfig

class OrdersConfig(AppConfig):
    name = "orders"

    if os.environ.get("HONEYCOMB_LOG") == "True":
        def ready(self):
            beeline.init(writekey=os.environ["HONEYCOMB_API_KEY"], dataset="lis-production",
                              service_name="lis", debug=False)

My Django setting has DEBUG=False. However, I'm seeing an excessive amount of request log statements on the console.

reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Thu, 21 Jan 2021 20:05:31 GMT
header: Content-Type: application/json
header: Content-Length: 64
header: Connection: keep-alive
header: Access-Control-Allow-Origin: *
header: Content-Encoding: gzip
header: Vary: Accept-Encoding
2021-01-21 20:05:31,910 DEBUG https://api.honeycomb.io:443 "POST /1/batch/lis-production HTTP/1.1" 200 64
2021-01-21 20:05:31,910 DEBUG https://api.honeycomb.io:443 "POST /1/batch/lis-production HTTP/1.1" 200 64
send: b'POST /1/batch/lis-production HTTP/1.1\r\nHost: api.honeycomb.io\r\nUser-Agent: libhoney-py/1.10.0 beeline-python/2.16.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\nContent-Encoding: gzip\r\nX-Honeycomb-Team: 632a7b8862ccc4cd53092d6e10c2fd58\r\nContent-Type: application/json\r\nContent-Length: 11606\r\n\r\n'
send: b'\x1f\x8b\x08\x00\xc8\xde\t`\x04\xff\xed}\x89\x8e\xe56v\xe8\xaf\x14\x1a\x18 \x06\xec\x82\xf6e\x80\x04\x98\xd8m\xc4\x03Ow\x9e\x17\x0c\x92qp\xa1\x85\xaa\x92\xfbn\xbeK/\x0e\xe6\xdf\xdf!\xc5\xe5P:\x92x\xed\xfb*\x83\x17\x96\x8dF\x15\xcf!E\x89\xd2\xd9\x97\xbf\xfd\xf7\xabK\xbfc\xaf\xfe\xf8\xf0*\n\xa2\xf0\x8b \xfc"\n\x7f\x88\x82?\x06\xd9\x1f\xe3\xe81\x0c\xd2(\xcc\xfe\xf3\xd5\xe7\x0f\xaf\xce\xd5\xee\xb8e\xa7\xea\xc2\x91C\x18h\xabK\x05\xbf\xfe\xf7\xab3;\xbd\xef\x1b\xb6\xd9W\xc3B\xdb\xfe\xcc\'

How can I get rid of these request logs in production?

tracer context manager ignores parent and trace_id arguments

beeline-python/beeline/trace.py

Line 41 in 35acade

def __call__(self, name, trace_id=None, parent_id=None):

As you can see, the function __call__ does not make use of these parameters. A linter should have caught this...

As a result, it's impossible to use the context manager to continue a trace across threads, for example.

Beeline support for CherryPy

Suggestion for supporting CherryPy here:

beeline.init(writekey=honeycombApiToken, dataset='xyz', service_name='something.or.other')

def reportBeeline(func):
    def wrapper(*args, **kwargs):
        trace = beeline.start_trace()
        beeline.add_context({'method': cherrypy.request.method, 'endpoint': cherrypy.request.path_info})
        try:
            result = func(*args, **kwargs)
        except Exception as e:
            beeline.finish_trace(trace)
            raise
        beeline.finish_trace(trace)
        return result
    return wrapper

Then, for each function that deals with a HTTP request, simply decorate it with the reportBeeline() decorator:

@cherrypy.tools.json_out(handler=dumper)
@reportBeeline
def servePage():
    return 'test'

Automatically decorate Flask routes

Currently, Flask routes come in with a boring old root function of flask_http_get, and that's all you get. There's a request.path, but that doesn't really specify a route (especially if I pattern-match things internally -- as I would in https://github.com/honeycombio/examples/blob/39c8732285c5f9cffaa728b7d55840c601039b8e/python-gatekeeper/app.py#L49 ). It would be much better for the span name to be the function name for the route, or at least for the function name for the route to be part of it.

Could we add the line and optional backtrace to the line that triggered the db_event

WSGIMiddlware has bad or confusing imports

We can see these imports

from beeline.propagation import Request   <---- Request
from flask import current_app, signals
# needed to build a request object from environ in the middleware
from werkzeug.wrappers import Request   <---- Request again, overriding it

Then a request is instantiated here

req = Request(environ, shallow=True)

It looks like the import from the propagate module is wrong because it doesn't take parameters. It has to be using the werkzeug one. But the built instance is not really used, so, can it be removed along with the imports to avoid confusion?

TypeError('Object of type datetime is not JSON serializable')

I just tried to set up honeycomb in my relatively simple (admin site only) Django 2.1.5 app with a Postgres 11.1 DB running locally, configured with the middleware added, and:

from django.apps import AppConfig  # type: ignore
from django.conf import settings  # type: ignore
import beeline  # type: ignore


class LettersConfig(AppConfig):
    name = "letters"

    def ready(self):
        beeline.init(
            writekey=settings.HONEYCOMB_WRITE_KEY,
            dataset="artandtybie",
            service_name="my-app-name",
            debug=True,
        )

I have so far not received any events in the Honeycomb UI, and see a lot of these messages in the logs:

2019-02-09 04:06:31,892 - honeycomb-sdk-xmit - DEBUG - enqueuing response = {'status_code': 0, 'body': '', 'error': TypeError('Object of type datetime is not JSON serializable'), 'duration': 5.233049392700195, 'metadata': None}

Maybe they're coming from trying to serialize query args for a query against a model with a date field?

2019-02-09 04:06:31,633 - honeycomb-sdk - DEBUG - send enqueuing event ev = {'service_name': 'my-app-name', 'meta.beeline_version': '2.4.6', 'meta.local_hostname': 'localhost.localdomain', 'name': 'django_postgresql_query', 'trace.trace_id': '58a65be3-31a9-4c70-a9cd-8068459d30ef', 'trace.parent_id': '474479f5-e10b-474c-94f3-4830afc4e116', 'trace.span_id': '2abbba28-578c-4b0e-b27c-f9844c93df85', 'type': 'db', 'db.query': 'SELECT "django_session"."session_key", "django_session"."session_data", "django_session"."expire_date" FROM "django_session" WHERE ("django_session"."expire_date" > %s AND "django_session"."session_key" = %s)', 'db.query_args': (datetime.datetime(2019, 2, 9, 4, 6, 31, 626689, tzinfo=<UTC>), 'xxxxxxxx'), 'db.duration': 2.083, 'db.last_insert_id': 0, 'db.rows_affected': 1, 'duration_ms': 2.4789999999999996}

Am I configuring this Beeline correctly?

	if (type(name) == str and not name.startswith("app.")) or type(name) != str:
	key = "app.%s" % name
	else:
	key = name

	# populate any propagated custom context
	if isinstance(context, dict):
	for k, v in context.items():
	beeline.add_trace_field(k, v)

	duration = datetime.datetime.now() - span.event.start_time
	duration_ms = duration.total_seconds() * 1000.0

honeycombio / beeline-python Goto Github PK

beeline-python's Introduction

Honeycomb Beeline for Python

Compatible with

Updating to 3.3.0

Contributions

beeline-python's People

Contributors

Stargazers

Watchers

Forkers

beeline-python's Issues

Steps to reproduce

Recommend Projects

Recommend Topics

Recommend Org

Jobs