GithubHelp home page GithubHelp logo

aykut / django-bulk-update Goto Github PK

View Code? Open in Web Editor NEW
431.0 13.0 59.0 136 KB

Bulk update using one query over Django ORM

License: MIT License

Python 100.00%
django-orm django-bulk bulk django update

django-bulk-update's Introduction

django-bulk-update

Build Status Coverage Status

Simple bulk update over Django ORM or with helper function.

This project aims to bulk update given objects using one query over Django ORM.

Installation

pip install django-bulk-update

Usage

With manager:

import random
from django_bulk_update.manager import BulkUpdateManager
from tests.models import Person

class Person(models.Model):
    ...
    objects = BulkUpdateManager()

random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
  person.name = random.choice(random_names)

Person.objects.bulk_update(people, update_fields=['name'])  # updates only name column
Person.objects.bulk_update(people, exclude_fields=['username'])  # updates all columns except username
Person.objects.bulk_update(people)  # updates all columns
Person.objects.bulk_update(people, batch_size=50000)  # updates all columns by 50000 sized chunks

With helper:

import random
from django_bulk_update.helper import bulk_update
from tests.models import Person

random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
  person.name = random.choice(random_names)

bulk_update(people, update_fields=['name'])  # updates only name column
bulk_update(people, exclude_fields=['username'])  # updates all columns except username
bulk_update(people, using='someotherdb')  # updates all columns using the given db
bulk_update(people)  # updates all columns using the default db
bulk_update(people, batch_size=50000)  # updates all columns by 50000 sized chunks using the default db

Note: You can consider to use .only('name') when you only want to update name, so that Django will only retrieve name data from db.

And consider to use .defer('username') when you don't want to update username, so Django won't retrieve username from db. These optimization can improve the performance even more.

Performance Tests:

Here we test the performance of the bulk_update function vs. simply calling .save() on every object update (dmmy_update). The interesting metric is the speedup using the bulk_update function more than the actual raw times.

# Note: SQlite is unable to run the `timeit` tests
# due to the max number of sql variables
In [1]: import os
In [2]: import timeit
In [3]: import django

In [4]: os.environ['DJANGO_SETTINGS_MODULE'] = 'tests.test_settings'
In [5]: django.setup()

In [6]: from tests.fixtures import create_fixtures

In [7]: django.db.connection.creation.create_test_db()
In [8]: create_fixtures(1000)

In [9]: setup='''
import random
from django_bulk_update import helper
from tests.models import Person
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
ids = list(Person.objects.values_list('id', flat=True)[:1000])
people = Person.objects.filter(id__in=ids)
for p in people:
    name = random.choice(random_names)
    p.name = name
    p.email = '%[email protected]' % name
bu_update = lambda: helper.bulk_update(people, update_fields=['name', 'email'])
'''

In [10]: bu_perf = min(timeit.Timer('bu_update()', setup=setup).repeat(7, 100))

In [11]: setup='''
import random
from tests.models import Person
from django.db.models import F
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
ids = list(Person.objects.values_list('id', flat=True)[:1000])
people = Person.objects.filter(id__in=ids)
def dmmy_update():
    for p in people:
        name = random.choice(random_names)
        p.name = name
        p.email = '%[email protected]' % name
        p.save(update_fields=['name', 'email'])
'''

In [12]: dmmy_perf = min(timeit.Timer('dmmy_update()', setup=setup).repeat(7, 100))
In [13]: print 'Bulk update performance: %.2f. Dummy update performance: %.2f. Speedup: %.2f.' % (bu_perf, dmmy_perf, dmmy_perf / bu_perf)
Bulk update performance: 7.05. Dummy update performance: 373.12. Speedup: 52.90.

Requirements

  • Django 1.8+

Contributors

TODO

  • Geometry Fields support

License

django-bulk-update is released under the MIT License. See the LICENSE file for more details.

django-bulk-update's People

Contributors

arnau126 avatar aykut avatar benoss avatar daleobrien avatar gabriel-laet avatar hoverhell avatar joshblum avatar luzfcb avatar solumos avatar sruon avatar tatterdemalion avatar torchingloom avatar wetneb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

django-bulk-update's Issues

License?

What would the license of this be categorized under?

update_fields for foreign key

Hello. I am found a problem with update_fields with foreign keys. In Django update_fields works as instance.save(update_fields=['foreignkey'])
In bulk_update its works only in one case: instance.save(update_fields=['foreignkey_id'])

Incorrect SQL generated if passed > 1 object on Python 3.x

Hi there,

In Python 3, the built-in function filter returns an iterator (as opposed to a list on python 2.x). Unfortunately, iterators can only be iterated over once. Subsequent attempts to use an iterator return immediately, as though the iterator is empty.

This results in the case_clauses not getting updated when you pass in more than a single object to update.

The fix is to cast the filter results to lists. This should work on python 2 and 3 (though I've only tested it on 3):

        fields = list(filter(
        lambda f: (not isinstance(f, models.AutoField)) and
                  (f.attname in update_fields),
        meta.fields))
    fields = list(filter(lambda f: not f.attname in exclude_fields, fields))

tests/models.py breaks everything.

It pollutes namespace of already existing tests/models.py in a project. Strange things start happening as django and celery gets confused.

add suport for JsonField

ProgrammingError: column "some_json_field" is of type json but expression is of type text
LINE 1: UPDATE "table" SET "some_json_field" = CAST(CASE "id" W...

when bulk_update(objs, update_fields=['some_json_field'])

Raise error when model with empty pk field is passed

Automated tests in my project failed because it turned out that bulk_update wasn't creating the new model instances I had in the list. I think bulk_update should raise a ValueError saying that it can operate on existing rows only.

Many to many field

Hi there,

Bulk updating a many-to-many field gives me the following error:

These fields are not present in current meta: my_many_to_many_field

I see the package only checks the _meta of the model and validates if the fields to be updated are present in there. However, many_to_many field references are stored in _meta.many_to_many. Any suggestions how I should approach this? Is this a bug or as designed? I'm not extremely experienced with many-to-many fields.

Thanks in advance!

Bennie

Batch size is not supported with BulkUpdateManager

The batch_size, which is responsible for slicing updating data to chunks, is presented within helper module, but the method of the BulkUpdateManager doesn't accept this kwarg, hence never pass it to helper's function.

Combine with use of .only() / .defer()

I like the bulk update solution; thank you for that.

One thing you could mention in the usage is to use a queryset that does not pull all data from the db; i.e. use only() / defer(). This improves performance even further.

Paul

Djngo REST API: KeyError at /api/update

i was trying to perform bulk update using django-bulk-update

**https://github.com/aykut/django-bulk-update**

But it throws KeyError at /api/update 'id'

views

class CartUpdatesView(ListBulkCreateUpdateDestroyAPIView):
    queryset=models.Cart.objects.all()
    serializer_class=serializers.CartUpdatesSerializer

serializers

class CartUpdatesSerializer(BulkSerializerMixin,serializers.ModelSerializer):
       
        class Meta(object):
             model  = models.Cart 
             fields = '__all__'
             list_serializer_class=BulkListSerializer

models

class Cart(models.Model): 
    cart_id=models.AutoField(primary_key=True)
    product_qty=models.IntegerField()
    customer_id=models.IntegerField()    
 product_id=models.ForeignKey('Products',db_column='product_id',on_delete=models.CASCADE)

Does this happens because of 2 primary keys while joiing tables?. I dont have any idea on bulk update. By searching I figure out official doc provide only very little information on bulk update and everyone refers to use this package.

GET

HTTP 200 OK
Allow: GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept

[
    {
        "cart_id": 1,
        "product_qty": 4,
        "customer_id": 1,
        "product_id": 1
    }
]

PUT

[
    {
        "cart_id": 1,
        "product_qty": 9,
        "customer_id": 1,
        "product_id": 1
    }
]

Traceback

Environment:


Request Method: PUT
Request URL: http://localhost:8000/api/update

Django Version: 2.2.5
Python Version: 3.6.4
Installed Applications:
['django.contrib.admin',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'shoppingcart',
 'rest_framework',
 'rest_framework.authtoken']
Installed Middleware:
['django.middleware.security.SecurityMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'django.middleware.clickjacking.XFrameOptionsMiddleware']



Traceback:

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\django\core\handlers\exception.py" in inner
  34.             response = get_response(request)

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\django\core\handlers\base.py" in _get_response
  115.                 response = self.process_exception_by_middleware(e, request)

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\django\core\handlers\base.py" in _get_response
  113.                 response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\django\views\decorators\csrf.py" in wrapped_view
  54.         return view_func(*args, **kwargs)

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\django\views\generic\base.py" in view
  71.             return self.dispatch(request, *args, **kwargs)

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\rest_framework\views.py" in dispatch
  505.             response = self.handle_exception(exc)

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\rest_framework\views.py" in handle_exception
  465.             self.raise_uncaught_exception(exc)

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\rest_framework\views.py" in raise_uncaught_exception
  476.         raise exc

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\rest_framework\views.py" in dispatch
  502.             response = handler(request, *args, **kwargs)

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\rest_framework_bulk\generics.py" in put
  140.         return self.bulk_update(request, *args, **kwargs)

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\rest_framework_bulk\drf3\mixins.py" in bulk_update
  73.         serializer.is_valid(raise_exception=True)

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\rest_framework\serializers.py" in is_valid
  737.                 self._validated_data = self.run_validation(self.initial_data)

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\rest_framework\serializers.py" in run_validation
  618.         value = self.to_internal_value(data)

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\rest_framework\serializers.py" in to_internal_value
  654.                 validated = self.child.run_validation(item)

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\rest_framework\serializers.py" in run_validation
  430.         value = self.to_internal_value(data)

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\rest_framework_bulk\drf3\serializers.py" in to_internal_value
  27.             id_field = self.fields[id_attr]

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\rest_framework\utils\serializer_helpers.py" in __getitem__
  148.         return self.fields[key]

Exception Type: KeyError at /api/update
Exception Value: 'id'

bulk_create

I'm looking at how django implemented the bulk_create option. The main logic from that exists here: https://github.com/django/django/blob/master/django/db/models/query.py#L1049-L1069 From what I understand the bulk_create will be making many N queries to the database. So they are using a rather naive approach from my understanding.

Whereas django-bulk-update is just making a single query with all updates. I'm bringing this up because it seems like it would be ideal to be using bulk_create to make bulk_update or create in #49 per comment there.

Bulk Update Progress Logs

It would be nice to be kept up-to-date with the progress that the bulk update is doing using logs to the console. For example:

  • Which page is currently being worked on.
  • Total number of rows updated.
  • Timings (if you're updating 100 batches of 10,000 rows getting a time estimate of the first page gives you a reference for the whole job).
  • Log and display any errors (this could probably be a separate issue).

Just a couple of suggestions for a great library, cheers!

Use pythonic sequence ducktyping instead of hardcoding to list type

The list of objects,and the update_fields and exclude_fields arguments break when passed a set, tuple, iterable or any other sequence type that is not specifically a list.

This is not very pythonic, most API's allow to pass whatever and will convert if their internals require a specific type of sequence.

This is annoying if we pass things like dict.keys() as update_fields, or a set of deduped objects etc.

django-bulk-update 1.1.10 not working for basic example for Python 2.7.9, Django 1.9.0, sqlite

I have not worked with this package before so I was trying to get the basic example in the documentation to work. I get an error as show below.

Python 2.7.9 Enthought Canopy
Django: 1.9.0
Database: sqlite
django-bulk-update: 1.1.10

In models file:

from bulk_update.manager import BulkUpdateManager

class PlantInfo(models.Model):

    processed_plant_id = models.CharField(max_length=50, primary_key=True)
    plant_name = models.CharField(max_length=100, blank=True)
    original_plant_id = models.CharField(max_length=100)
    street_addr = models.CharField(max_length=250, blank=True)
    city = models.CharField(max_length=100, blank=True)
    state = models.CharField(max_length=2, blank=True)
    zip_code = models.CharField(max_length=9, blank=True)
    status = models.CharField(max_length=2, blank=True)
    date_created = models.DateTimeField(blank=True)
    date_modified = models.DateTimeField(blank=True)

    objects = BulkUpdateManager()

    def unique_states(self):
        states = PlantLocation.objects.order_by('state').values('state').distinct()
        return states

    def __unicode__(self):
        return self.processed_plant_id

The data in the model looks something like the following. There are currently 921 records in the database.

In [199]: PlantInfo.objects.all().values()
Out[199]: [{'status': u'2', 
'city': u'City name', 
'date_modified': datetime.datetime(2016, 2, 10, 15, 5, 43, 666000, tzinfo=<UTC>), 
'street_addr': u'11175 Street Address', 
'original_plant_id': u'M10034',
'state': u'SC', 
'plant_name': u'Plant Name', 
'processed_plant_id': u'M10034', 
'date_created': datetime.datetime(2016, 2, 10, 15, 5, 43, 666000, tzinfo=<UTC>), 
'zip_code': u'12345'
},... 

I'm running the following code from Django shell to test.

from myapp.models import PlantInfo
update_recs = PlantInfo.objects.all()
for rec in update_recs:
    rec.plant_name = "It worked"

PlantInfo.objects.bulk_update(update_recs)

Error message:

In [196]:                 PlantInfo.objects.bulk_update(update_recs)
---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
<ipython-input-196-6a75800cb104> in <module>()
----> 1 PlantInfo.objects.bulk_update(update_recs)

C:\Users\slgardner\AppData\Local\Enthought\Canopy\User\lib\site-packages\bulk_up
date\manager.pyc in bulk_update(self, objs, update_fields, exclude_fields, batch
_size)
      9             objs, update_fields=update_fields,
     10             exclude_fields=exclude_fields, using=self.db,
---> 11             batch_size=batch_size)

C:\Users\slgardner\AppData\Local\Enthought\Canopy\User\lib\site-packages\bulk_up
date\helper.pyc in bulk_update(objs, meta, update_fields, exclude_fields, using,
 batch_size, pk_field)
    172             del values, pks
    173
--> 174             connection.cursor().execute(sql, parameters)
    175     return lenpks

C:\Users\slgardner\AppData\Local\Enthought\Canopy\User\lib\site-packages\django\
db\backends\utils.pyc in execute(self, sql, params)
     81             stop = time()
     82             duration = stop - start
---> 83             sql = self.db.ops.last_executed_query(self.cursor, sql, para
ms)
     84             self.db.queries_log.append({
     85                 'sql': sql,

C:\Users\slgardner\AppData\Local\Enthought\Canopy\User\lib\site-packages\django\
db\backends\sqlite3\operations.pyc in last_executed_query(self, cursor, sql, par
ams)
    125         if params:
    126             if isinstance(params, (list, tuple)):
--> 127                 params = self._quote_params_for_last_executed_query(para
ms)
    128             else:
    129                 keys = params.keys()

C:\Users\slgardner\AppData\Local\Enthought\Canopy\User\lib\site-packages\django\
db\backends\sqlite3\operations.pyc in _quote_params_for_last_executed_query(self
, params)
    114         # Native sqlite3 cursors cannot be used as context managers.
    115         try:
--> 116             return cursor.execute(sql, params).fetchone()
    117         finally:
    118             cursor.close()

OperationalError: too many SQL variables

Unnecessary variable loaded_fields

If no fields exist we return None and exist the bulk_update.

However later we try to get loaded_fields here. But as fields will never be None or empty we will never use evaluate the get_fields function again. So the variable loaded_fields should be unnecessary.

ProgrammingError: can't adapt type 'ImageFieldFile'

When casting file types, postgres throws an error.

ProgrammingError                          Traceback (most recent call last)
<ipython-input-4-e56590351e17> in <module>()
----> 1 bulk_update(profiles)

/Users/aykutozat/.virtualenvs/gezi/lib/python2.7/site-packages/bulk_update/helper.py in bulk_update(objs, update_fields, exclude_fields, using, batch_size)
     63             _batched_update(objs[batch_size:], fields, batch_size, connection)
     64
---> 65     _batched_update(objs, fields, batch_size, connection)

/Users/aykutozat/.virtualenvs/gezi/lib/python2.7/site-packages/bulk_update/helper.py in _batched_update(objs, fields, batch_size, connection)
     59             del values, pks
     60
---> 61             connection.cursor().execute(sql, paramaters)
     62
     63             _batched_update(objs[batch_size:], fields, batch_size, connection)

/Users/aykutozat/.virtualenvs/gezi/lib/python2.7/site-packages/django/db/backends/util.pyc in execute(self, sql, params)
     67         start = time()
     68         try:
---> 69             return super(CursorDebugWrapper, self).execute(sql, params)
     70         finally:
     71             stop = time()

/Users/aykutozat/.virtualenvs/gezi/lib/python2.7/site-packages/django/db/backends/util.pyc in execute(self, sql, params)
     51                 return self.cursor.execute(sql)
     52             else:
---> 53                 return self.cursor.execute(sql, params)
     54
     55     def executemany(self, sql, param_list):

/Users/aykutozat/.virtualenvs/gezi/lib/python2.7/site-packages/django/db/utils.pyc in __exit__(self, exc_type, exc_value, traceback)
     97                 if dj_exc_type not in (DataError, IntegrityError):
     98                     self.wrapper.errors_occurred = True
---> 99                 six.reraise(dj_exc_type, dj_exc_value, traceback)
    100
    101     def __call__(self, func):

/Users/aykutozat/.virtualenvs/gezi/lib/python2.7/site-packages/django/db/backends/util.pyc in execute(self, sql, params)
     51                 return self.cursor.execute(sql)
     52             else:
---> 53                 return self.cursor.execute(sql, params)
     54
     55     def executemany(self, sql, param_list):

ProgrammingError: can't adapt type 'ImageFieldFile'

TypeError if updating ArrayField

Model class

from django.contrib.postgres.fields import ArrayField
from django.db import models


class Brand(models.Model):
    name = models.CharField(max_length=128, unique=True, db_index=True)
    codes = ArrayField(models.CharField(max_length=64), default=['code_1'])

when I try to add something to field "codes"

from django_bulk_update.helper import bulk_update


brands = Brand.objects.all()[:1]

need_to_update_brands = []

for brand in brands:
    if 'code_2' not in brand.codes:
        brand.sitecodes.append('code_2')
        brand.sitecodes.append('code_3')
        need_to_update_brands.append(brand)

bulk_update(need_to_update_brands)

I get Error

<ipython-input-1-4dd5c7b95886> in <module>()
     72         need_to_update_brands.append(brand)
     73 
---> 74 bulk_update(need_to_update_brands)

python3.6/site-packages/django_bulk_update/helper.py in bulk_update(objs, meta, update_fields, exclude_fields, using, batch_size, pk_field)
    218         lenpks += n_pks
    219 
--> 220         connection.cursor().execute(sql, parameters)
    221 
    222     return lenpks

python3.6/site-packages/django/db/backends/utils.py in execute(self, sql, params)
     79         start = time()
     80         try:
---> 81             return super(CursorDebugWrapper, self).execute(sql, params)
     82         finally:
     83             stop = time()

python3.6/site-packages/django/db/backends/utils.py in execute(self, sql, params)
     64             else:
     65                 from IPython.core.debugger import Tracer; Tracer()()
---> 66                 return self.cursor.execute(sql, params)
     67 
     68     def executemany(self, sql, param_list):

TypeError: not all arguments converted during string formatting

this is because here "return self.cursor.execute(sql, params)"
sql is

'UPDATE "brand" SET "name" = CAST(CASE "id" WHEN %s THEN %s ELSE "name" END AS varchar(128)), "codes" = CAST(CASE "id" WHEN %s THEN %s ELSE "codes" END AS varchar(64)[]) WHERE "id" in (%s)'

and params

[1, 'Test Brand_1', 1, 'code_1', 'code_2', 'code_3', 1]

so in string "sql" that has 5 string insertions (%s) we try to insert 7 params

ProgrammingError: syntax error at or near "CHECK" LINE 1: ...t, someint = (CASE id WHEN 1 THEN 3 END)::integer CHECK ("so...

When casting integer types, postgres throws an error.

ProgrammingError                          Traceback (most recent call last)
<ipython-input-4-53aaaa1ea66f> in <module>()
----> 1 bulk_update(news)

/Users/aykutozat/.virtualenvs/gezi/lib/python2.7/site-packages/bulk_update/helper.py in bulk_update(objs, update_fields, exclude_fields, using, batch_size)
     63             _batched_update(objs[batch_size:], fields, batch_size, connection)
     64
---> 65     _batched_update(objs, fields, batch_size, connection)

/Users/aykutozat/.virtualenvs/gezi/lib/python2.7/site-packages/bulk_update/helper.py in _batched_update(objs, fields, batch_size, connection)
     59             del values, pks
     60
---> 61             connection.cursor().execute(sql, paramaters)
     62
     63             _batched_update(objs[batch_size:], fields, batch_size, connection)

/Users/aykutozat/.virtualenvs/gezi/lib/python2.7/site-packages/django/db/backends/util.pyc in execute(self, sql, params)
     67         start = time()
     68         try:
---> 69             return super(CursorDebugWrapper, self).execute(sql, params)
     70         finally:
     71             stop = time()

/Users/aykutozat/.virtualenvs/gezi/lib/python2.7/site-packages/django/db/backends/util.pyc in execute(self, sql, params)
     51                 return self.cursor.execute(sql)
     52             else:
---> 53                 return self.cursor.execute(sql, params)
     54
     55     def executemany(self, sql, param_list):

/Users/aykutozat/.virtualenvs/gezi/lib/python2.7/site-packages/django/db/utils.pyc in __exit__(self, exc_type, exc_value, traceback)
     97                 if dj_exc_type not in (DataError, IntegrityError):
     98                     self.wrapper.errors_occurred = True
---> 99                 six.reraise(dj_exc_type, dj_exc_value, traceback)
    100
    101     def __call__(self, func):

/Users/aykutozat/.virtualenvs/gezi/lib/python2.7/site-packages/django/db/backends/util.pyc in execute(self, sql, params)
     51                 return self.cursor.execute(sql)
     52             else:
---> 53                 return self.cursor.execute(sql, params)
     54
     55     def executemany(self, sql, param_list):

ProgrammingError: syntax error at or near "CHECK"
LINE 1: ...t, deneme = (CASE id WHEN 1 THEN 3  END)::integer CHECK ("de...

Data truncated for column "rate"

I have something like:

currencies = Currency.objects.all()

for c in currencies:
c.rate = 1.9923462

Currency.objects.bulk_update(currencies)

This should be handled.

Specifying `batch_size` on bulk_update would only process the first batch

The remaining data is not processed when specifying batch_size when using bulk_update.helper.bulk_update function. This is because lenpks is returned after processing the first batch. Ideally, lenpks should be returned after all data are processed right?

connection.cursor().execute(sql, parameters)
return lenpks

bulk_update doesn't work if the pk of the table is an UUID

bulk_update doesn't work if the pk of the table is an UUID.
That's because the python representation of an uuid contains dashes while the mysql representation, doesn't.

Example:
python repr: 'be7e79a3-c43e-4b21-a0d1-ef17affad1f0'
mysql repr: 'be7e79a3c43e4b21a0d1ef17affad1f0'

This issue makes that no row is updated at all.

Document that bulk update is available in Django 2.2

Hello @aykut! Thank you for this library and your work on it!

I just wanted to let you know that coming in Django 2.2 there will be a bulk_update() function available on any Model (just like bulk_create) that uses the same method as this library (and is heavily inspired by it!) ๐ŸŽ‰ ๐ŸŽ‰

Would documenting this be possible? We could perhaps add a fallback to the built-in bulk_create if we are running in Django 2.2, to ease upgrading? We are also looking at adding support for specialized db-specific SQL to Django to speed this up, I'd love any input you may have!

Here are the docs for the new method

Support for auto_now?

Hey,
does django-bulk-update support DateTimeFields with auto_now?
I tested it and django-bulk-update doesn't update the "updated" Field in my model which is a models.DateTimeField(auto_now=True) field.
Do you know how to fix this issue?

bulk_update_or_create(model_instances) or bulk_update(model_instances, upsert=True)?

For my current job we need bulk upsert of records, and I'm thinking of forking your package and implementing bulk_upsert myself. If/when I do that, I'd like to do it in the manner that's most likely to be accepted into your project, so as not to maintain an independent fork.

Which syntax do you prefer?

  • bulk_update_or_create(model_instances)
  • bulk_update(model_instances, upsert=True)
  • bulk_upsert(model_instances)

For now I'd only make my changes compatible with Postgres 9.5+, because that's what we're using and because I'm relatively new at this niche.

Any other advice/comment?

Invalid utf8mb4 character string

Invalid utf8mb4 character string Warning message pops up when storing compressed text data in BinaryField. This message does not appear when using create() or bulk_create() django functions.

Exact error:
mysql/base.py:101: Warning: (1300, "Invalid utf8mb4 character string: 'D44F6F'")
return self.cursor.execute(query, args)

Drop batch_size option

Drop the batch_size and add an example usage of chunking querysets.

At the moment, batch_size is a bit confusing. bulk_update in almost all cases takes an already iterated queryset or list of objects, and builds a SQL query over it. batch_size option misleading people to think bulk_update already chunking querysets(obviously not). In this case batch_size can only be useful for slicing the SQL queries.

So let's drop it and let it to other functions to slice queryset and send chunks to bulk_update.

Thanks for the idea to @joshblum

helper.bulk_update seems to fail when only one pk

At ./helper.pyL#63 paramaters contains a tuple of ids for the WHERE condition (WHERE pk in (val1, val2))
when there is only one item to update then the resulting sql ends in WHERE pk in (val1,) so I get
(1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ')' at line 1")
I know that if I'm not going to update only one object, then I should not use bulk_update but, shouldn't this be capable of handling also a list of only 1 item? or at list handle the error :-)
A code like this on ./helper.pyL#57 fixed the issue for me

   if len(pks) > 1:
                paramaters.extend([tuple(pks)])
                sql = 'UPDATE {dbtable} SET {values} WHERE {pkcolumn} in %s'\
                    .format(dbtable=dbtable, values=values, pkcolumn=pkcolumn)
            else:
                paramaters.extend(pks)
                sql = 'UPDATE {dbtable} SET {values} WHERE {pkcolumn} in (%s)'\
                    .format(dbtable=dbtable, values=values, pkcolumn=pkcolumn) 

I don't know if you like the idea. If you do, then I could do a PR for you
Thanks for this nice tool

BulkUpdateQuerySet object is not an iterator

  File "/Users/jonathan/projects/generic_proj/venv/lib/python3.6/site-packages/django/views/generic/base.py", line 68, in view
    return self.dispatch(request, *args, **kwargs)
  File "/Users/jonathan/projects/generic_proj/venv/lib/python3.6/site-packages/django/contrib/auth/mixins.py", line 56, in dispatch
    return super(LoginRequiredMixin, self).dispatch(request, *args, **kwargs)
  File "/Users/jonathan/projects/generic_proj/venv/lib/python3.6/site-packages/django/views/generic/base.py", line 88, in dispatch
    return handler(request, *args, **kwargs)
  File "/Users/jonathan/projects/generic_proj/apps/orders/views.py", line 1104, in get
    except StopIteration:
TypeError: 'BulkUpdateQuerySet' object is not an iterator

I have a manager that inherits from BulkUpdateManager and when it tries to iterate on the queryset created by the manager this error occurs.

To further clarify, next was called on the queryset

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.