GithubHelp home page GithubHelp logo

peopledoc / django-chunkator Goto Github PK

View Code? Open in Web Editor NEW
108.0 14.0 10.0 97 KB

Chunk large QuerySets into small chunks, and iterate over them without killing your RAM.

Home Page: https://pypi.python.org/pypi/django-chunkator

License: MIT License

Python 100.00%
ghec-mig-migrated approved-public

django-chunkator's Introduction

django-chunkator

Chunk large QuerySets into small chunks, and iterate over them without killing your RAM.

image

Tested with all the combinations of:

  • Python: 3.5, 3.6, 3.7, 3.8
  • Django: 2, 2.1, 2.2, 3.0, master

Note

Django 3.0 is incompatible with Python 3.5, see <https://docs.djangoproject.com/en/3.0/releases/3.0/#python-compatibility>

Usage

from chunkator import chunkator
for item in chunkator(LargeModel.objects.all(), 200):
    do_something(item)

This tool is intended to work on Django querysets.

Your model must define a pk field (this is done by default, but sometimes it can be overridden) and this pk has to be unique. django- chunkator has been tested with PostgreSQL and SQLite, using regular PKs and UUIDs as primary keys.

You can also use values():

from chunkator import chunkator
for item in chunkator(LargeModel.objects.values('pk', 'name'), 200):
    do_something(item)

Important

If you're using values() you have to add at least your "pk" field to the values, otherwise, the chunkator will throw a MissingPkFieldException.

Warning

This will not accelerate your process. Instead of having one BIG query, you'll have several small queries. This will save your RAM instead, because you'll not load a huge queryset result before looping on it.

If you want to manipulate the pages directly, you can use `chunkator_page`:

from chunkator import chunkator_page
queryset = LargeModel.objects.all().values('pk')
for page in chunkator_page(queryset, 200):
    launch_some_task([item['pk'] for item in page])

FAQ

  • How is django-chunkator different from Django's iterator?

If you have server-side cursors (using Postgres or Oracle & not setting DISABLE_SERVER_SIDE_CURSORS), then the main difference is that the cursor is in the hands of the application instead of the server. It really depends on your constraints, but sometimes server side cursors can put too much strains on your DB.

If you don't have server-side cursors, then chunkator will allow you to iterate over your queryset by batch, without relying on LIMIT/OFFSET. The problem with LIMIT/OFFSET is that computing a large offset (when you're at the end of your queryset) requires the DB to go through all the previous entries. With large tables this can be a huge issue.

  • Will django-chunkator preserve the ordering on my querysets?

No, it orders the queryset by pk. However you could do the same thing than chunkator with another field, given that it's unique and not nullable, see here for more details.

License

MIT License.

django-chunkator's People

Contributors

boblefrag avatar brunobord avatar eliotberriot avatar hsmett avatar joehybird avatar k4nar avatar mike-perdide avatar pmourlanne avatar scythargon avatar wo0dyn avatar zebuline avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

django-chunkator's Issues

FYI: works with Django 2.0.5

I've been using it with Django 2.0.5 without an issue. Just wanted to give a heads up that you can probably add this to the supported versions list.

Alternate sort

give an opportunity to sort by a number of criteria, and still loop over the records without losing one.

feature requests: support keep origin ordering of the queryset.

In most cases, We want to sort a queryset by update_datetime and then iterate it.

e.g. send a message when an new vedio was published or updated.

last_update_datetime = cache.get("last-update-datetime")
for vedio in chunkator(Vedio.objects.filter(update_datetime__gte=last_update_datetime).order_by("update_datetime", 200):
    send_notify(vedio)
    cache.set("last-update-datetime", vedio.update_datetime)

Python 3

using tox and travis.yml, let's make it Py3 compatible.

Fix chunkator when the primary key is a OneToOneField

django-chunkator orders by pk.

But sometimes, you want to order by an explicit field.

For example:

class ModelA(models.Model):
    name = models.CharField(max_length=20)

    class Meta:
        ordering = ['name']


class ModelB(models.Model):
    modela = models.OneToOneField(ModelA, primary_key=True)

(with django 1.8 ?) when we write this queryset: ModelB.objects.order_by('pk'), it will in background order by modela__name.
We need to order the queryset by modela_id explicitely:

from chunkator import chunkator
for item in chunkator(ModelB.objects.all(), 200, order_by='modela_id'):
    do_something(item)

Edit: it is possible to fix this case without changing the API. See PR #20

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.