mercadona / rele Goto Github PK

View Code? Open in Web Editor NEW

199.0 199.0 24.0 456 KB

Easy to use Google Pub/Sub

Home Page: https://mercadonarele.readthedocs.io/en/latest/index.html

License: Apache License 2.0

Makefile 1.82% Python 98.02% Dockerfile 0.16%

event-driven google-cloud hacktoberfest pubsub python

rele's People

Contributors

Stargazers

Watchers

rele's Issues

Style guide

We currently have flake8 in our linting command. We do not have isort in the project style guidelines.

Shall we add it?

Or try out something new like black?

Add message.id to insert handler

I think #36 is somewhat similar? I have a use case where I'd like to use the message.id in the subscription handler. Currently, data + attributes are passed.

I can probably get around this by using preprocess_message hook to mutate the message.data or message. attributes but I'd prefer not to go that route.

One subscriber function listening to n topics

We have a use case where the same logic is used while listening to two separate topic. It would be nice if we could do something like:

@sub(topics=('topic1', 'topic2'))
def my_sub(payload, **kwargs):
	print('do the same thing')

This would result in two subs from the same function.

Configure Publisher Client timeout

I would like to be able to configure the timeout when publishing a message in a blocking fashion. Right now, the default and hard coded way is set to 3.0 seconds.

I propose adding a configuration so I can declare any number of seconds for the publisher.

Ex.

RELE = {
	...
	'PUBLISHER_TIMEOUT': 5.0
}

Breaking with Google PubSub >= 2.0

In the 1.0 release, I had to pin pubsub to <2.0. Otherwise, our tests would break. It seems there were major changes, and we need to update some usages in our code.

To reproduce:

update google pubsub library to >2.0.
Run tests

Remove PROJECT_ID

Since we are using the credentials object in the project, we can extract the Project id from the credentials. This would eliminate the need for another settings configuration.

Parametrize subscription ack deadline

For any message for a topic (rele's subscription), use a custom ack deadline value instead of default 10 seconds using ack_deadline_seconds parameter in SubscriberClient.create_subscription() when creating a subscription.

It has been seen, that in some cases, there is a requirement to expend more than default 10 seconds to consume (late acknowledge) a message.

https://cloud.google.com/pubsub/docs/reference/rest/v1/projects.subscriptions/modifyAckDeadline

Write documentation for rele's settings

When #38 is finished we should have a documentation page with all new settings

Simplify initializing and running the Worker

One consistent peice of feedback that I have received is simplifying the Worker class. Right now, you must Initialize with the subs, each individual config attribute, run setup, run start, and then sleep.

Like this:

worker = Worker(
    [photo_uploaded],
    config.gc_project_id,
    config.credentials,
    config.ack_deadline,
)
worker.setup()
worker.start()
sleep(120)

I propose we simplify the API to run a worker to look something like this instead:

worker = Worker([photo_uploaded], config)
worker.run(sleep=120)

run would call both setup and start, and we could add the standard sleep method. In addition we consolidate the configuration into one attribute.

This would be backwards compatible since we will be creating a new method. And the change to Worker initialization could also fall back to the declared attributes if defined. Otherwise, use the config object.

Improve sub decorator

The sub decorator can get some ease of use improvements

Use functools.wraps to preserve __name__ and __doc__ of the original function
Inspect callback function signature and raise an exception if the signature is not compatible (ie. missing data)
Log a warning message if the function cannot be discovered (ie. no sub in the path)

Slightly related to the second point: @sub("topic", filter_by=42) will raise a TypeError at runtime (because filters are used as callables but they are not checked)

Add message as a parameter to Callback hooks

As a Relé client I would like to be able to log the message in hooks, but right now it is not part of hooks parameters. So I think that by adding them as a parameter, then clients may add a new middleware and do whatever they want with it. Unless you think it should be a part of the current messages, and want it to be added as an extra

Catch exceptions while waiting for futures result

Our current way of publishing while blocking, or waiting, for a future could be a bit dangerous. This is because Google can raise errors like a TimeoutError.

I would suggest to catch the error, and add another hook called post_publish_failure.

In addition, we could add another parameter to publishing which would be like raise_exception: bool. If True, we wait but if the exception is raised, we silently fail and return a bool. If False, we can raise the exception.

Number of Database Connection Slots are exhausted

Im adding this in spite the fact that we have documentation stating that any Relé worker should have their MAX_CONN_AGE set to 0.

With that being said, I wonder if it would be worth raising an error or at least logging a warning that the setting should be set explicitly in the case that the worker has it set to another value.

I would avoid raising an error, as there could be a use case to increase the connection age (I'm not currently aware of any though).

Any thoughts?

Customize the ack deadline parameter of a single subscription

We previously created an issue (#44 ) that introduces a new setting value which overrides the default ack deadline of a PubSub subscription. However, that configuration applies to all subscriptions in the same way and we might have certain subscriptions that need a different value.

In order to solve that problem, we propose to add a new parameter to the sub decorator (which is the one in charge of creating subscriptions) that tweaks the ack deadline. For instance:

# subs.py
@sub('new_blog_entry', ack_deadline=180)
def my_very_expensive_callback():
    ...

In the last example, ack_deadline is the number of seconds set to the ack deadline of the subscription.

Decouple django.conf.settings and rele settings

In a first step to decouple django and relé, we need agnostic place to put the relé settings.

Right now, they are read directly from the project's settings.py file and then used like:
from django.conf import settings

We would also need to setup some default values, if applicable.

Suggestions:

directly from env var
another rele.settings file

To be honest, Im not entirely sure how we should handle this, so this is up for discussion. It will most likely require quite a bit of experimentation and trial and error.

Filter by message attribute

We would like to have our subscribers filter messages from a particular topic based on message attribute(s). For instance, we have one publisher publishing to topic foobar. The message is published with either attribute location='backyard' or location='frontyard'.

If we have two subscribers listening to the same topic, one should only act on location='backyard'.

I don't want to publish to two separate topics as they are publishing the same data type, only the location attribute changes from message to message.

Additionally, we would like to globally define the filter value in the app. If we have two workers running the same subscriber, one should be able to filter the messages only for backyard messages and the other for frontyard.

This is a proposal implementation:

# Scenario A (Local usage)

# subs.py
@sub('update_location', filter_by={'location': 'backyard'})
def update_location_backyard():
    pass

@sub('update_location', filter_by={'location': 'frontyard'})
def update_location_frontyard():
    pass

# Scenario B (Global Setting)

# settings.py
RELE_FILTER_BY={
    'update_location': {
        'location': 'frontyard'
    }
}

# subs.py
@sub('update_location', filter_by=settings.RELE_FILTER_BY)
def update_location():
    pass

Publishing without boilerplate

Current Behaviour

Right now, if you want to call rele.publish(), you must call config.setup() before. This is so that
init_global_publisher is called and setup properly. Otherwise a ValueError is raised.

Proposal

I propose that a user should be able to call rele.publish without the boilerplate of config.setup. Instead, I think we can call init_global_publisher if publishing and there is no global publisher. The tough part will be getting the settings so that the credentials can be configured properly.

But I do believe this can be solved, to make the UX more elegant and avoid boilerplate.

Consuming messages crashes when not json valid

When the body of the message is not a valid json the callback crashes. Instead the callback should catch the JSONDecodeError and discard (ack) the message so it is not retried.

Runtime error when trying to read project_id from default google creds

I'm getting the following error when trying to run rele.config.setup using default credentials:

rele.config.setup({
	"GC_CREDENTIALS_PATH": None,
    "MIDDLEWARE": [
     	"rele.contrib.LoggingMiddleware",
        "rele.contrib.FlaskMiddleware",
     ],
     "APP_NAME": "smart_comms_planner",
}, flask_app=app)

output:

  File "/Users/matthewbridges/repos/smart-comms-planner/src/__init__.py", line 66, in <module>
    rele.config.setup(settings["rele"], flask_app=app)
  File "/Users/matthewbridges/.local/share/virtualenvs/smart-comms-planner-yrxYHqso/lib/python3.8/site-packages/rele/config.py", line 69, in setup
    init_global_publisher(config)
  File "/Users/matthewbridges/.local/share/virtualenvs/smart-comms-planner-yrxYHqso/lib/python3.8/site-packages/rele/publishing.py", line 10, in init_global_publisher
    gc_project_id=config.gc_project_id,
  File "/Users/matthewbridges/.local/share/virtualenvs/smart-comms-planner-yrxYHqso/lib/python3.8/site-packages/rele/config.py", line 59, in gc_project_id
    return self.credentials.project_id
AttributeError: 'Credentials' object has no attribute 'project_id'

The code is incorrectly attempting to read project_id off the credentials object, when in fact it is returned as a tuple from get_google_defaults.

I've added a proposed fix here: #195

Fix linting to check rele directory

Legacy version code was under pubsub directory and make lint command continues checking instead to check main rele directory.

diff --git a/Makefile b/Makefile
index 1bf3b22..2755981 100644
--- a/Makefile
+++ b/Makefile
@@ -18,7 +18,7 @@ clean-pyc: ## remove Python file artifacts
 	find . -name '*~' -exec rm -f {} +
 
 lint: ## check style with flake8
-	flake8 pubsub tests
+	flake8 rele tests
 
 test: ## run tests quickly with the default Python
 	python runtests.py tests

After such change, some linting errors must be fixed.

DEFAULT_ACK_DEADLINE should be moved to the Config Object

In the publisher class, there is the use of DEFAULT_ACK_DEADLINE.

I propose that this should be moved to the config object so it can be set via the settings dictionary.

Add an API to allow passing objects to middleware

I'm adding rele to flask app, and the subscription callbacks need the flask app_context. I've done this by way of middleware:

eg:

class FlaskMiddleware(BaseMiddleware):
    def pre_process_message(self, subscription, message):
        from server import app
        self.ctx = app.app_context()
        self.ctx.push()

    def post_process_message(self):
        self.ctx.pop()

But to make this reusable across our services and other flask apps, I'd need a way to add arbitrary data to the config that is passed to middleware.setup method or have an easy way to call custom middleware functions.

eg:

class FlaskMiddleware(BaseMiddleware):
    def setup(self, config):
        self.app = config["FLASK_APP"]

    def pre_process_message(self, subscription, message):
        self.ctx = self.app.app_context()
        self.ctx.push()

    def post_process_message(self):
        self.ctx.pop()

Add a message trace_id

To track message propagation.

Additionally, if we want to spawn new messages from failed messages, we can reassign this id to the new message that will go to a dead letter queue(?).

Add message attributes to metrics log

What?

Logging middleware logs next properties under metrics:

agent
topic
status
subscription

For example, succeeded message (contrib.logging_middleware.LoggingMiddleware.post_process_message_success).

    "levelname": "INFO",
	"message": "Successfully processed message for city-created - city-created-building-factory-spain",
    "module": "logging_middleware",
    "metrics": {
      "name": "subscriptions",
      "data": {
        "agent": "country-maker",
        "topic": "city-created",
        "status": "succeeded",
        "subscription": "city-created-building-factory-spain",
        "duration_seconds": 0.001
      }
    }

My proposal is to add a new more property called attributes which contains the message attributes.

For example, a `succeeded message would appear as follow:

    "levelname": "INFO",
	"message": "Successfully processed message for city-created - city-created-building-factory-spain",
    "module": "logging_middleware",
    "metrics": {
      "name": "subscriptions",
      "data": {
        "agent": "country-maker",
        "topic": "city-created",
        "status": "succeeded",
        "subscription": "city-created-building-factory-spain",
        "duration_seconds": 0.001,
		"attributes": {
			"country": "spain"
		}
      }
    }

It would apply for received, failed and previously mentioned succeeded messages.

Why?

As a real scenario, for a topic, several subscriptions can be registered which only process a message if a message attribute has a particular value.

For example, a topic like city-created can have at least four subscriptions:

city-created-building-factory-spain
city-created-roadway-factory-spain
city-created-building-factory-portugal
city-created-roadway-factory-portugal

city-created-building-factory-{country} and city-created-roadway-factory-{country} are generic for each country and only process messages when an attribute called country matches a defined value like spain or portugal

The subscriptions run in different environments (spain and portugal), but logs are consumed in centralized system.

When filtering logs for spain environment, there will be logs for messages for all subscriptions (both countries). However, subscriptions for not related country will have a succeeded log message with a very short duration (skipped message) and subscriptions for related country will take a bit longer to execute.

If log message metrics contain attributes, it would be easy to implement a filter to skip logs from a different country to avoid any noise during log analysis.

Define multiple filters in a subscription

Subscription class accepts filter_by to be a function that filters the messages to be processed by the sub regarding their attributes.

However, in more than occasion, if a message contains more than one attribute, it would be nice to use a list of filters for each attribute, for example.

I would like to suggest that filter_by parameter accepts a list of functions as well.

Configurable number of threads per subscription

Right now google's library defaults to ThreadPoolExecutor with a max of 10 threads, which may not be what you want.

There could be a num_threads arguments when instantiating subscriptions that passes a scheduler with an explicit number of threads.

https://googleapis.dev/python/pubsub/latest/subscriber/index.html
https://googleapis.dev/python/pubsub/latest/subscriber/api/scheduler.html

Autodiscovery support for class based Subscriptions

Ive seen the use of class based subscriptions become popular.

And unfortunately, the way we autodiscover subscriptions is using isinstance(attribute, Subscription) which does not pick up the class based subscriptions since they are not instantiated at import time, unlike when using @sub.

One solution that I have been thinking about is using isinstance(attribute, Subscription) or issubclass(attribute, Subscription).

Pin core dependencies

To make builds more reproducible, as well as errors, please specify the Python version you guys are supporting also lock the versions of your requirement files you guys created.

You're following a not very traditional approach to the requirements, by the way.. It should be something like:

requirements.txt -> contains the requirements to have the application running.
requirements_dev.txt -> contains developer-specific requirements like test suites and others alike.
requirements_prod.txt -> contains production-specific requirements if you have those.

Otherwise, it will be really hard to tackle issues like #184

Bool parameter types are invalid when publishing

Publishing breaks when trying to pass in any type other than strings.

This will break:

import rele
rele.publish(topic='foo-bar', data={'baz': 'boo'}, some_flag=True)

Returning:

File "/usr/local/lib/python3.7/site-packages/rele/publishing.py", line 42, in publish
    _publisher.publish(topic, data, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/rele/client.py", line 137, in publish
    future = self._client.publish(topic_path, payload, **attrs)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/publisher/client.py", line 261, in publish
    "All attributes being published to Pub/Sub must "
TypeError: All attributes being published to Pub/Sub must be sent as text strings.

I propose either:

adding extra documentation about the limits of the publishing parameters.
finding a method in which we can serialize/de-serialize certain types.

Documentation

Sphinx? Read-the-docs?

Any suggestions on documentation frameworks?

Automatically add timestamp to published message

According to the Google PubSub documentation, there is a publishTime attribute added to the message.

However, the caveat is that

The time at which the message was published, populated by the server

IMHO, a UNIX timestamp could be added as a message attribute when the message is published from the client. This would prevent race conditions from the clients as the timestamp would be more accurate to the state of the system when publishing.

We can following some standard examples of:

And use time.time() safely.

Accept attributes when creating a subscription

With current implementation is not possible to enable ordering, nor incoming features.

Using kwargs in create_subscription would open the possibility to enable new features by upgrading google client libraries.

Streamlining the Release process

Boy oh boy, that release process was a doozy. Anybody have experience with making the release process easier to compose?

To create 0.4.0, I had to,

Bump the __version__
Create the changelog
Create the release on github which was basically a copy/pasta of the changelog
make release on my machine.

Needless to say, there was a lot of room for error and improvement.

What would be nice to have is a:

more automated process that bumps, tags and creates the release on github
a release notes section in the docs

This for sphinx documentation https://releases.readthedocs.io/en/latest/
This for versioning https://github.com/c4urself/bump2version
This for changelog generation https://github.com/vaab/gitchangelog

Any other sources? Experiences?

Pass complete message body to filter_by (Subscriptions)

Now, filter_by only get the message attributes sent when publish to a topic, but, in some cases, we need access to the message body sent, for example to do a logging

I propose send the message body beside the message attributes as parameter to use in filter function when need this.

Worker should not crash when a topic does not exist

When a worker starts it tries to create subscriptions for all topics. If a topic does not exist the worker will crash.

Proposed change:
catch the Rendezvous exception (404 Resource not found), log an error but gracefully create and consume the rest of the defined subscriptions

Execution doesn't stop when using debugger

Hey 👋 first of all thanks for this great library!

This might not be a strictly Relé related issue, but when I tried to pause execution to debug some issue (ie dropping a __import__("pdb").set_trace()) inside the subscription message handler function it did not stop, but just kept on processing messages.

This might be a trivial problem to overcome, but my search engine fu failed me. I'm guessing the problem is that the subscriber handles the message in a thread, so it would somehow need to let the parent know to stop, but not really sure how to go about it.

Worker initialization should raise an error on duplicate subscriptions

If two subs subscribe to the same topic and don't declare a suffix, stealing will happen where approximately half the messages will be processed by one subscription and half by the other. That's extremely difficult to test against and to debug in production, so a protection would be very useful.

I'd suggest raising an error straight away so the worker doesn't start. A warning log wouldn't be as effective because it could go unnoticed.

'ThreadPoolExecutor' object has no attribute 'queue'

On Relé 0.9.0 I get the following error when starting a worker:

Starting Rele
Traceback (most recent call last):
  File "worker.py", line 12, in <module>
    worker.run_forever()
  File "/usr/local/lib/python3.7/site-packages/rele/worker.py", line 77, in run_forever
    self.start()
  File "/usr/local/lib/python3.7/site-packages/rele/worker.py", line 66, in start
    scheduler=scheduler,
  File "/usr/local/lib/python3.7/site-packages/rele/client.py", line 88, in consume
    subscription_path, callback=callback, scheduler=scheduler
  File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/subscriber/client.py", line 228, in subscribe
    manager.open(callback=callback, on_callback_error=future.set_exception)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/subscriber/_protocol/streaming_pull_manager.py", line 429, in open
    self._dispatcher = dispatcher.Dispatcher(self, self._scheduler.queue)
AttributeError: 'ThreadPoolExecutor' object has no attribute 'queue'
Sentry is attempting to send 0 pending error messages
Waiting up to 2 seconds
Press Ctrl-C to quit

The issue is that we are passing an executor to the consume API, when in fact it should be an instance of a Scheduler

Configurable serializer

Publishing serialization is done by DRF's rest_framework.renderers.JSONRenderer. It should be configurable and default to json.dumps

Coverage report

Add some service like coveralls.io? Any other suggestions? Does travis have this integration?

Ability to create topic

I was thinking that would be useful to be able to create a topic through a function or a command.

Is it a good feature to have?

SUB_PREFIX is not added to subscriptions

When running Relé as a standalone worker (ie, no django, flask, etc.) I expect the subscribers to be prefixed with the SUB_PREFIX value.

However, this does not happen since we are manually registering the subs in the worker.

This can result in a conflicts when having multiple workers subscribed to the same topic, leading to race conditions when consuming a message.

Self documentation of topic-subscribers in a project

As the number of topics/subscribers increases in a project, it becomes harder and harder to figure out where/what is being subscribed too.

It would be super cool to have a command like rele document which would output to the console a list of the topics and their subscribers with the sub name. That way you could get a quick glance at what the project's subscribers are doing.

Something like:

Topic	Subscriber(s)	Sub
do-something	lets-do-something	my_lets_do_something_subscriber
write-something-else	lets-write-something-else	my_lets_write_something_else_subscriber
write-something-else	lets-write-something-else-suffix	other_lets_write_something_else_subscriber

Handling unhandled exceptions

Using pub/sub out of the box, when an exception is raised in a subscription, it is caught and a nack() is executed, so the message is sent again, over and over again, until something is done.

I was trying Relé locally and realized that when an exception was raised, I wasn't getting my message again, so I started digging in, and saw this part of the code:

try:
    res = self._subscription(data, **dict(message.attributes))
except Exception as e:
    run_middleware_hook('post_process_message_failure',
                        self._subscription, e, start_time)
else:
    message.ack()
    run_middleware_hook('post_process_message_success',
                        self._subscription, start_time)
    return res
finally:
    run_middleware_hook('post_process_message')

So instead of the exception getting to pubsub's client, it is being caught by Relé, and not being raised again. Also, it's not manually doing a nack() nor an ack(). The final result, because an ack() is never executed, is that you eventually get the message again, but the why is quite strange.

Add Relé standalone documentation

Now, that Django is removed as a requirement, the use of Relé standalone should be documented properly.

I propose:

Making the Basics section framework agnostic.
Add a Django section.

Integrate APM

Integrate APM tracing using hooks.

Remove DRF dependency

This will break backwards compatibility because the default will become the std json encoder, but there is already a migration path

Global filter_by setting

As user I would like to be able to filter all messages in my app. The implementation would allow me to avoid declaring the same filter_by attribute in every subscriber.

In the case a global filter_by is defined and a subscriber filter_by is defined, I propose that the subscriber filter_by takes priority.

Missing contributing guidelines on closing pr/issues

I couldn't find how to proceed and which information to provide in the issue when closing it, after merging the related PR.

Rename post_publish hook to post_publish_success

We want to be more explicit with this middleware hook, just like we do withpost_process_message_failure and post_process_message_success.
For backwards compatibility we'll Initially allow both post_publish and post_publish_success and eventually we'll deprecate post_publish.

mercadona / rele Goto Github PK

rele's People

Contributors

Stargazers

Watchers

Forkers

rele's Issues

Related

Current Behaviour

Proposal

What?

Why?

Recommend Projects

Recommend Topics

Recommend Org

Jobs