mercadona / rele Goto Github PK
View Code? Open in Web Editor NEWEasy to use Google Pub/Sub
Home Page: https://mercadonarele.readthedocs.io/en/latest/index.html
License: Apache License 2.0
Easy to use Google Pub/Sub
Home Page: https://mercadonarele.readthedocs.io/en/latest/index.html
License: Apache License 2.0
We currently have flake8 in our linting command. We do not have isort in the project style guidelines.
Shall we add it?
Or try out something new like black?
I think #36 is somewhat similar? I have a use case where I'd like to use the message.id in the subscription handler. Currently, data + attributes are passed.
I can probably get around this by using preprocess_message hook to mutate the message.data or message. attributes but I'd prefer not to go that route.
We have a use case where the same logic is used while listening to two separate topic. It would be nice if we could do something like:
@sub(topics=('topic1', 'topic2'))
def my_sub(payload, **kwargs):
print('do the same thing')
This would result in two subs from the same function.
I would like to be able to configure the timeout when publishing a message in a blocking fashion. Right now, the default and hard coded way is set to 3.0 seconds.
I propose adding a configuration so I can declare any number of seconds for the publisher.
Ex.
RELE = {
...
'PUBLISHER_TIMEOUT': 5.0
}
In the 1.0 release, I had to pin pubsub to <2.0. Otherwise, our tests would break. It seems there were major changes, and we need to update some usages in our code.
To reproduce:
Since we are using the credentials object in the project, we can extract the Project id from the credentials. This would eliminate the need for another settings configuration.
For any message for a topic (rele's subscription), use a custom ack deadline value instead of default 10 seconds using ack_deadline_seconds
parameter in SubscriberClient.create_subscription()
when creating a subscription.
It has been seen, that in some cases, there is a requirement to expend more than default 10 seconds to consume (late acknowledge) a message.
https://cloud.google.com/pubsub/docs/reference/rest/v1/projects.subscriptions/modifyAckDeadline
When #38 is finished we should have a documentation page with all new settings
One consistent peice of feedback that I have received is simplifying the Worker class. Right now, you must Initialize with the subs
, each individual config
attribute, run setup
, run start
, and then sleep
.
Like this:
worker = Worker(
[photo_uploaded],
config.gc_project_id,
config.credentials,
config.ack_deadline,
)
worker.setup()
worker.start()
sleep(120)
I propose we simplify the API to run a worker to look something like this instead:
worker = Worker([photo_uploaded], config)
worker.run(sleep=120)
run
would call both setup
and start
, and we could add the standard sleep
method. In addition we consolidate the configuration into one attribute.
This would be backwards compatible since we will be creating a new method. And the change to Worker initialization could also fall back to the declared attributes if defined. Otherwise, use the config
object.
The sub
decorator can get some ease of use improvements
__name__
and __doc__
of the original functiondata
)sub
in the path)Slightly related to the second point: @sub("topic", filter_by=42)
will raise a TypeError
at runtime (because filters are used as callables but they are not checked)
As a Relé client I would like to be able to log the message in hooks, but right now it is not part of hooks parameters. So I think that by adding them as a parameter, then clients may add a new middleware and do whatever they want with it. Unless you think it should be a part of the current messages, and want it to be added as an extra
Our current way of publishing while blocking
, or waiting, for a future could be a bit dangerous. This is because Google can raise errors like a TimeoutError
.
I would suggest to catch the error, and add another hook called post_publish_failure
.
In addition, we could add another parameter to publishing which would be like raise_exception: bool
. If True
, we wait but if the exception is raised, we silently fail and return a bool. If False
, we can raise the exception.
Im adding this in spite the fact that we have documentation stating that any Relé worker should have their MAX_CONN_AGE set to 0.
With that being said, I wonder if it would be worth raising an error or at least logging a warning that the setting should be set explicitly in the case that the worker has it set to another value.
I would avoid raising an error, as there could be a use case to increase the connection age (I'm not currently aware of any though).
Any thoughts?
We previously created an issue (#44 ) that introduces a new setting value which overrides the default ack deadline of a PubSub subscription. However, that configuration applies to all subscriptions in the same way and we might have certain subscriptions that need a different value.
In order to solve that problem, we propose to add a new parameter to the sub
decorator (which is the one in charge of creating subscriptions) that tweaks the ack deadline. For instance:
# subs.py
@sub('new_blog_entry', ack_deadline=180)
def my_very_expensive_callback():
...
In the last example, ack_deadline
is the number of seconds set to the ack deadline of the subscription.
In a first step to decouple django and relé, we need agnostic place to put the relé settings.
Right now, they are read directly from the project's settings.py
file and then used like:
from django.conf import settings
We would also need to setup some default values, if applicable.
Suggestions:
To be honest, Im not entirely sure how we should handle this, so this is up for discussion. It will most likely require quite a bit of experimentation and trial and error.
We would like to have our subscribers filter messages from a particular topic based on message attribute(s). For instance, we have one publisher publishing to topic foobar
. The message is published with either attribute location='backyard'
or location='frontyard'
.
If we have two subscribers listening to the same topic, one should only act on location='backyard'
.
I don't want to publish to two separate topics as they are publishing the same data type, only the location attribute changes from message to message.
Additionally, we would like to globally define the filter value in the app. If we have two workers running the same subscriber, one should be able to filter the messages only for backyard
messages and the other for frontyard
.
This is a proposal implementation:
# Scenario A (Local usage)
# subs.py
@sub('update_location', filter_by={'location': 'backyard'})
def update_location_backyard():
pass
@sub('update_location', filter_by={'location': 'frontyard'})
def update_location_frontyard():
pass
# Scenario B (Global Setting)
# settings.py
RELE_FILTER_BY={
'update_location': {
'location': 'frontyard'
}
}
# subs.py
@sub('update_location', filter_by=settings.RELE_FILTER_BY)
def update_location():
pass
Right now, if you want to call rele.publish()
, you must call config.setup()
before. This is so that
init_global_publisher
is called and setup properly. Otherwise a ValueError
is raised.
I propose that a user should be able to call rele.publish
without the boilerplate of config.setup
. Instead, I think we can call init_global_publisher
if publishing and there is no global publisher. The tough part will be getting the settings so that the credentials can be configured properly.
But I do believe this can be solved, to make the UX more elegant and avoid boilerplate.
When the body of the message is not a valid json the callback crashes. Instead the callback should catch the JSONDecodeError
and discard (ack) the message so it is not retried.
I'm getting the following error when trying to run rele.config.setup
using default credentials:
rele.config.setup({
"GC_CREDENTIALS_PATH": None,
"MIDDLEWARE": [
"rele.contrib.LoggingMiddleware",
"rele.contrib.FlaskMiddleware",
],
"APP_NAME": "smart_comms_planner",
}, flask_app=app)
output:
File "/Users/matthewbridges/repos/smart-comms-planner/src/__init__.py", line 66, in <module>
rele.config.setup(settings["rele"], flask_app=app)
File "/Users/matthewbridges/.local/share/virtualenvs/smart-comms-planner-yrxYHqso/lib/python3.8/site-packages/rele/config.py", line 69, in setup
init_global_publisher(config)
File "/Users/matthewbridges/.local/share/virtualenvs/smart-comms-planner-yrxYHqso/lib/python3.8/site-packages/rele/publishing.py", line 10, in init_global_publisher
gc_project_id=config.gc_project_id,
File "/Users/matthewbridges/.local/share/virtualenvs/smart-comms-planner-yrxYHqso/lib/python3.8/site-packages/rele/config.py", line 59, in gc_project_id
return self.credentials.project_id
AttributeError: 'Credentials' object has no attribute 'project_id'
The code is incorrectly attempting to read project_id
off the credentials object, when in fact it is returned as a tuple from get_google_defaults
.
I've added a proposed fix here: #195
Legacy version code was under pubsub
directory and make lint
command continues checking instead to check main rele
directory.
diff --git a/Makefile b/Makefile
index 1bf3b22..2755981 100644
--- a/Makefile
+++ b/Makefile
@@ -18,7 +18,7 @@ clean-pyc: ## remove Python file artifacts
find . -name '*~' -exec rm -f {} +
lint: ## check style with flake8
- flake8 pubsub tests
+ flake8 rele tests
test: ## run tests quickly with the default Python
python runtests.py tests
After such change, some linting errors must be fixed.
In the publisher class, there is the use of DEFAULT_ACK_DEADLINE.
I propose that this should be moved to the config object so it can be set via the settings dictionary.
I'm adding rele to flask app, and the subscription callbacks need the flask app_context. I've done this by way of middleware:
eg:
class FlaskMiddleware(BaseMiddleware):
def pre_process_message(self, subscription, message):
from server import app
self.ctx = app.app_context()
self.ctx.push()
def post_process_message(self):
self.ctx.pop()
But to make this reusable across our services and other flask apps, I'd need a way to add arbitrary data to the config that is passed to middleware.setup method or have an easy way to call custom middleware functions.
eg:
class FlaskMiddleware(BaseMiddleware):
def setup(self, config):
self.app = config["FLASK_APP"]
def pre_process_message(self, subscription, message):
self.ctx = self.app.app_context()
self.ctx.push()
def post_process_message(self):
self.ctx.pop()
To track message propagation.
Additionally, if we want to spawn new messages from failed messages, we can reassign this id to the new message that will go to a dead letter queue(?).
Logging middleware logs next properties under metrics
:
agent
topic
status
subscription
For example, succeeded
message (contrib.logging_middleware.LoggingMiddleware.post_process_message_success
).
"levelname": "INFO",
"message": "Successfully processed message for city-created - city-created-building-factory-spain",
"module": "logging_middleware",
"metrics": {
"name": "subscriptions",
"data": {
"agent": "country-maker",
"topic": "city-created",
"status": "succeeded",
"subscription": "city-created-building-factory-spain",
"duration_seconds": 0.001
}
}
My proposal is to add a new more property called attributes
which contains the message attributes.
For example, a `succeeded message would appear as follow:
"levelname": "INFO",
"message": "Successfully processed message for city-created - city-created-building-factory-spain",
"module": "logging_middleware",
"metrics": {
"name": "subscriptions",
"data": {
"agent": "country-maker",
"topic": "city-created",
"status": "succeeded",
"subscription": "city-created-building-factory-spain",
"duration_seconds": 0.001,
"attributes": {
"country": "spain"
}
}
}
It would apply for received
, failed
and previously mentioned succeeded
messages.
As a real scenario, for a topic, several subscriptions can be registered which only process a message if a message attribute has a particular value.
For example, a topic like city-created
can have at least four subscriptions:
city-created-building-factory-spain
city-created-roadway-factory-spain
city-created-building-factory-portugal
city-created-roadway-factory-portugal
city-created-building-factory-{country}
and city-created-roadway-factory-{country}
are generic for each country and only process messages when an attribute called country
matches a defined value like spain
or portugal
The subscriptions run in different environments (spain
and portugal
), but logs are consumed in centralized system.
When filtering logs for spain
environment, there will be logs for messages for all subscriptions (both countries). However, subscriptions for not related country will have a succeeded
log message with a very short duration (skipped message) and subscriptions for related country will take a bit longer to execute.
If log message metrics contain attributes
, it would be easy to implement a filter to skip logs from a different country to avoid any noise during log analysis.
Subscription
class accepts filter_by
to be a function that filters the messages to be processed by the sub regarding their attributes.
However, in more than occasion, if a message contains more than one attribute, it would be nice to use a list of filters for each attribute, for example.
I would like to suggest that filter_by
parameter accepts a list of functions as well.
Right now google's library defaults to ThreadPoolExecutor
with a max of 10 threads, which may not be what you want.
There could be a num_threads
arguments when instantiating subscriptions that passes a scheduler with an explicit number of threads.
https://googleapis.dev/python/pubsub/latest/subscriber/index.html
https://googleapis.dev/python/pubsub/latest/subscriber/api/scheduler.html
Ive seen the use of class based subscriptions become popular.
And unfortunately, the way we autodiscover subscriptions is using isinstance(attribute, Subscription)
which does not pick up the class based subscriptions since they are not instantiated at import time, unlike when using @sub
.
One solution that I have been thinking about is using isinstance(attribute, Subscription) or issubclass(attribute, Subscription)
.
To make builds more reproducible, as well as errors, please specify the Python version you guys are supporting also lock the versions of your requirement files you guys created.
You're following a not very traditional approach to the requirements, by the way.. It should be something like:
requirements.txt
-> contains the requirements to have the application running.
requirements_dev.txt
-> contains developer-specific requirements like test suites and others alike.
requirements_prod.txt
-> contains production-specific requirements if you have those.
Otherwise, it will be really hard to tackle issues like #184
Publishing breaks when trying to pass in any type other than strings.
This will break:
import rele
rele.publish(topic='foo-bar', data={'baz': 'boo'}, some_flag=True)
Returning:
File "/usr/local/lib/python3.7/site-packages/rele/publishing.py", line 42, in publish
_publisher.publish(topic, data, **kwargs)
File "/usr/local/lib/python3.7/site-packages/rele/client.py", line 137, in publish
future = self._client.publish(topic_path, payload, **attrs)
File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/publisher/client.py", line 261, in publish
"All attributes being published to Pub/Sub must "
TypeError: All attributes being published to Pub/Sub must be sent as text strings.
I propose either:
Sphinx? Read-the-docs?
Any suggestions on documentation frameworks?
According to the Google PubSub documentation, there is a publishTime
attribute added to the message.
However, the caveat is that
The time at which the message was published, populated by the server
IMHO, a UNIX timestamp could be added as a message attribute when the message is published from the client. This would prevent race conditions from the clients as the timestamp would be more accurate to the state of the system when publishing.
We can following some standard examples of:
And use time.time()
safely.
With current implementation is not possible to enable ordering, nor incoming features.
Using kwargs
in create_subscription
would open the possibility to enable new features by upgrading google client libraries.
Boy oh boy, that release process was a doozy. Anybody have experience with making the release process easier to compose?
To create 0.4.0, I had to,
__version__
make release
on my machine.Needless to say, there was a lot of room for error and improvement.
What would be nice to have is a:
This for sphinx documentation https://releases.readthedocs.io/en/latest/
This for versioning https://github.com/c4urself/bump2version
This for changelog generation https://github.com/vaab/gitchangelog
Any other sources? Experiences?
Now, filter_by
only get the message attributes sent when publish to a topic, but, in some cases, we need access to the message body sent, for example to do a logging
I propose send the message body beside the message attributes as parameter to use in filter function when need this.
When a worker starts it tries to create subscriptions for all topics. If a topic does not exist the worker will crash.
Proposed change:
catch the Rendezvous
exception (404 Resource not found), log an error but gracefully create and consume the rest of the defined subscriptions
Hey 👋 first of all thanks for this great library!
This might not be a strictly Relé related issue, but when I tried to pause execution to debug some issue (ie dropping a __import__("pdb").set_trace()
) inside the subscription message handler function it did not stop, but just kept on processing messages.
This might be a trivial problem to overcome, but my search engine fu failed me. I'm guessing the problem is that the subscriber handles the message in a thread, so it would somehow need to let the parent know to stop, but not really sure how to go about it.
If two subs
subscribe to the same topic and don't declare a suffix, stealing will happen where approximately half the messages will be processed by one subscription and half by the other. That's extremely difficult to test against and to debug in production, so a protection would be very useful.
I'd suggest raising an error straight away so the worker doesn't start. A warning log wouldn't be as effective because it could go unnoticed.
On Relé 0.9.0 I get the following error when starting a worker:
Starting Rele
Traceback (most recent call last):
File "worker.py", line 12, in <module>
worker.run_forever()
File "/usr/local/lib/python3.7/site-packages/rele/worker.py", line 77, in run_forever
self.start()
File "/usr/local/lib/python3.7/site-packages/rele/worker.py", line 66, in start
scheduler=scheduler,
File "/usr/local/lib/python3.7/site-packages/rele/client.py", line 88, in consume
subscription_path, callback=callback, scheduler=scheduler
File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/subscriber/client.py", line 228, in subscribe
manager.open(callback=callback, on_callback_error=future.set_exception)
File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/subscriber/_protocol/streaming_pull_manager.py", line 429, in open
self._dispatcher = dispatcher.Dispatcher(self, self._scheduler.queue)
AttributeError: 'ThreadPoolExecutor' object has no attribute 'queue'
Sentry is attempting to send 0 pending error messages
Waiting up to 2 seconds
Press Ctrl-C to quit
The issue is that we are passing an executor to the consume API, when in fact it should be an instance of a Scheduler
Publishing serialization is done by DRF's rest_framework.renderers.JSONRenderer
. It should be configurable and default to json.dumps
Add some service like coveralls.io? Any other suggestions? Does travis have this integration?
I was thinking that would be useful to be able to create a topic through a function or a command.
Is it a good feature to have?
When running Relé as a standalone worker (ie, no django, flask, etc.) I expect the subscribers to be prefixed with the SUB_PREFIX value.
However, this does not happen since we are manually registering the subs in the worker.
This can result in a conflicts when having multiple workers subscribed to the same topic, leading to race conditions when consuming a message.
As the number of topics/subscribers increases in a project, it becomes harder and harder to figure out where/what is being subscribed too.
It would be super cool to have a command like rele document
which would output to the console a list of the topics and their subscribers with the sub name. That way you could get a quick glance at what the project's subscribers are doing.
Something like:
Topic | Subscriber(s) | Sub |
---|---|---|
do-something | lets-do-something | my_lets_do_something_subscriber |
write-something-else | lets-write-something-else | my_lets_write_something_else_subscriber |
write-something-else | lets-write-something-else-suffix | other_lets_write_something_else_subscriber |
Using pub/sub out of the box, when an exception is raised in a subscription, it is caught and a nack()
is executed, so the message is sent again, over and over again, until something is done.
I was trying Relé locally and realized that when an exception was raised, I wasn't getting my message again, so I started digging in, and saw this part of the code:
try:
res = self._subscription(data, **dict(message.attributes))
except Exception as e:
run_middleware_hook('post_process_message_failure',
self._subscription, e, start_time)
else:
message.ack()
run_middleware_hook('post_process_message_success',
self._subscription, start_time)
return res
finally:
run_middleware_hook('post_process_message')
So instead of the exception getting to pubsub's client, it is being caught by Relé, and not being raised again. Also, it's not manually doing a nack()
nor an ack()
. The final result, because an ack()
is never executed, is that you eventually get the message again, but the why is quite strange.
Now, that Django is removed as a requirement, the use of Relé standalone should be documented properly.
I propose:
Integrate APM tracing using hooks.
This will break backwards compatibility because the default will become the std json encoder, but there is already a migration path
As user I would like to be able to filter all messages in my app. The implementation would allow me to avoid declaring the same filter_by
attribute in every subscriber.
In the case a global filter_by
is defined and a subscriber filter_by
is defined, I propose that the subscriber filter_by
takes priority.
I couldn't find how to proceed and which information to provide in the issue when closing it, after merging the related PR.
We want to be more explicit with this middleware hook, just like we do withpost_process_message_failure
and post_process_message_success
.
For backwards compatibility we'll Initially allow both post_publish
and post_publish_success
and eventually we'll deprecate post_publish
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.