GithubHelp home page GithubHelp logo

alexhayes / django-cereal Goto Github PK

View Code? Open in Web Editor NEW
13.0 2.0 4.0 24 KB

Efficient serialization of Django Models for use in Celery that ensure the state of the world.

License: MIT License

Python 94.01% Shell 5.99%

django-cereal's Introduction

django-cereal

Efficient serialization of Django Models for use in Celery that ensure the state of the world.

It supports Django 1.7, 1.8 and 1.9 for Python versions 2.7, 3.3, 3.4, 3.5 and pypy (where Django supports the Python version).

Scenario

If you're using Django and Celery you're most likely passing instances of models back and forth between tasks or, as the Celery docs suggest, you're passing just the primary key to a task and then retrieving the the model instance with the primary key.

If you're doing the former, it's potentially inefficient and certainly dangerous as by the time the task executes the models data could be changed!

If you're using the later, you're probably wondering to yourself, surely there is a better way?! While it's efficient and certainly readable it's not exactly much fun continually fetching the model at the start of each task...

You may also be using model methods as tasks, but unless you're using something similar to this refresh decorator, you'll potentially have stale model data.

django-cereal to the rescue...

How It Works

django-cereal works by using an alternative serializer before the task is sent to the message bus and then retrieves a fresh instance of the model during deserialization. Currently only pickle is supported (feel free to fork and implement for JSON or YAML).

Essentially when the model is serialized only the primary key and the model's class are pickled. This is obviously not quite as efficient as pickling just the models primary key, but it's certainly better than serializing the entire model!

When the task is picked up by a Celery worker and deserialized an instance of the model is retrieved using YourModel.objects.get(pk=xxx) and thus this approach is also safe as you're not using stale model data in your task.

The serializer is registered with kombu and safely patches django.db.Model.__reduce__ - it only operates inside the scope of kombu and thus doesn't mess with a model's pickling outside of kombu.

Installation

You can install django-cereal either via the Python Package Index (PyPI) or from github.

To install using pip;

$ pip install django-cereal

From github;

$ pip install git+https://github.com/alexhayes/django-cereal.git

Usage

All that is required is that you specify the kwarg serializer when defining a task.

from django_cereal.pickle import DJANGO_CEREAL_PICKLE

@app.task(serializer=DJANGO_CEREAL_PICKLE)
def my_task(my_model):
    ...

There is also a helper task that you can use which defines the serializer if it's not set.

from django_cereal.pickle import task

@task
def my_task(my_model):
    ...

Another approach is to set CELERY_TASK_SERIALIZER to django-cereal-pickle.

Model Task Methods

You can also use task methods on your Django models, so you don't have to define them in a tasks.py. For example;

from celery.contrib.methods import task_method
from django_cereal.pickle import DJANGO_CEREAL_PICKLE
from yourproject.celery import app


task_method_kwargs = dict(filter=task_method,
                      serializer=DJANGO_CEREAL_PICKLE)


class MyModel(models.Model):

    @app.task(name='MyModel.foo', **task_method_kwargs)
    def foo(self):
        # self is an instance of MyModel

Then, you can call your task as follows;

bar = MyModel.objects.get(...)
bar.foo.delay()

Just like your would a normal task but you can stop defining tasks that simply orchestrate calls on a model and just call the model directly.

Chaining Task Methods

While not directly related to serialization of Django models, if you are using Django Model methods as tasks, or any class methods as tasks for that matter, and you are chaining these tasks you may be interested in the @ensure_self decorator (see Celery issue #2137 for more details).

Database Connections

Note that if you use the --maxtasksperworker flag in Celery, or under other similar situations, the connection to a database in Django could become unusable, with errors such as the following thrown;

OperationalError(2006, 'MySQL server has gone away')

This is now handled by the unpickling by closing down the database connection which forces a new connection to be created.

Perhaps in the future there may be a nicer way of handling this, for instance, a new connection is created each time a worker is created, but for now the fix in place works, even if it's not ideal.

License

This software is licensed under the MIT License. See the LICENSE file in the top distribution directory for the full license text.

Author

Alex Hayes <[email protected]>

django-cereal's People

Contributors

alexhayes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

django-cereal's Issues

'MySQL server has gone away' with --maxtasksperchild

When using MySQL and starting workers with --maxtasksperchild=n upon n tasks being performed by the worker and it being re-spawned the following error occurs;

2015-06-26 08:42:56,326 CRITICAL celery.worker.consumer: Can't decode message body: DecodeError(OperationalError(OperationalError(2006, 'MySQL server has gone away'), <function _model_unpickle at 0x7f4906cca140>, (<class 'xxx.models.MyModel'>, {│
  │'pk': 16749L})),) [type:'application/x-python-serialize' encoding:'binary' headers:{}]                                                                                                                                                                                   
...
...

Removing the --maxtasksperchild argument from the worker fixes the problem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.