GithubHelp home page GithubHelp logo

Comments (8)

philipn avatar philipn commented on August 27, 2024

I'm not entirely sure what the specific issue is here, so please let me know! I believe that the iteration over self.filters is necessary to allow the end user to filter against any of the fields on the related filterset. I believe that this initialization happens basically just once, and not every time someone runs a query against the API?

Aside: I believe the behavior is O(c^n) where c is the number of allowed underscore-filter-things (probably something like 20, always the same constant) and n is the number of relations you have, e.g.:

  Model A:
    -> Model B
       -> Model C
  = c^3

n is probably not ever going to be very big unless you're doing something really strange, so I don't think this is necessarily a complexity problem?

from django-rest-framework-filters.

JockeTF avatar JockeTF commented on August 27, 2024

I work on the same project as Andy and asked him to create an issue about the large number of iterations we were seeing in the constructor. Sorry that it took so long, but I'll try to explain my findings in more detail now.

Note that this was tested with django-rest-framework-chain 0.1.3.

I was running a profiler on our project and saw that around 5% of the cumulative time was spent in the constructor of ChainedFilterSet. After investigating further I saw the constructor being called once for each call to our API with up to 250 iterations in the loop per request. The iterations were the same regardless of if any filtering was requested by the user.

Here is how we use chained filters: https://gist.github.com/JockeTF/b48b3385eaa7098eb3f5

To see number of iterations I did the following:

class ChainedFilterSet(django_filters.FilterSet):
    def __init__(self, *args, **kwargs):
        super(ChainedFilterSet, self).__init__(*args, **kwargs)

        before = len(self.filters)
        iterations = 0

        for name, filter_ in six.iteritems(self.filters):
            iterations += 1

            if isinstance(filter_, RelatedFilter): ...

        after = len(self.filters)
        print("{}: {} --> {}".format(iterations, before, after))

I would see it print the number of iterations for each request to the API:

241: 8 --> 241
241: 8 --> 241
79:  4 --> 79
79:  4 --> 79
243: 8 --> 243
243: 8 --> 243

This is for three different views, each being called without any query parameters.

To investigate this further I made a hack for decreasing the number of iterations:

class ChainedFilterSet(django_filters.FilterSet):
    def __init__(self, *args, **kwargs):
        super(ChainedFilterSet, self).__init__(*args, **kwargs)

        query_params = args[0].keys()
        potential_names = list()
        actual_filters = dict()

        for query_param in query_params:
            pieces = []

            for piece in query_param.split(LOOKUP_SEP):
                pieces.append(piece)
                potential_names.append(LOOKUP_SEP.join(pieces))

        for name in potential_names:
            if name not in self.filters:
                continue

            filter_ = self.filters[name]
            actual_filters[name] = filter_

            if isinstance(filter_, RelatedFilter): ...

        self.filters = actual_filters

If the user does not request any filtering, then no filters will be created. Instead, filters will only be created when they are actually needed. To see the improvements I ran our project's test suit several times, both with and without the hack. Below you will find the average time it took to run the tests. This includes 428 tests, many of which tests views filtered via django-rest-framework-chain.

Without the hack: 14.731s
   With the hack: 11.545s

In our case the performance improvement is around 25%.

from django-rest-framework-filters.

philipn avatar philipn commented on August 27, 2024

@JockeTF Thanks so much for taking the time to explain this!

That makes perfect sense, and I think I see the problem here. I believe the issue is that DRF actually does initialize a FilterSet object on each request, something I assumed it didn't do. Rather than try and make DRF not do this, I think the best thing is to try and move the filter construction logic into class creation (__new__) rather than object construction (__init__).

Could you take a look at these changes here:

https://github.com/philipn/django-rest-framework-chain/tree/attempted_fix_for_issue_8

and see if they fix your performance problem?

from django-rest-framework-filters.

JockeTF avatar JockeTF commented on August 27, 2024

Nope!

That breaks filtering for us:

  File "rest_framework_chain/filterset.py", line 50, in __new__
    new_cls = super(ChainedFilterSet, cls).__new__(cls, *args, **kwargs)
TypeError: object() takes no parameters

After fixing that:

======================================================================
FAIL: test_contact_ordering_and_filtering (api.tests.test_contact.ListContactTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_contact.py", line 47, in test_contact_ordering_and_filtering
    'to_user__username__iexact': users[0].username.swapcase(),
  File "tests/drivers.py", line 79, in _test
    self.assertEqual(len(objects), len(results))
AssertionError: 2 != 6

I don't think it would help though since __new__ is called each time an instance is created:

>>> class Camelid():
...   def __new__(cls):
...     if cls.__name__ == 'Alpaca':
...       print("Yay! Alpacas!")
...     return super(Camelid, cls).__new__(cls)
... 
>>> class Alpaca(Camelid):
...   pass
... 
>>> Alpaca()
Yay! Alpacas!
<__main__.Alpaca object at 0x7ffc32c00668>
>>> Alpaca()
Yay! Alpacas!
<__main__.Alpaca object at 0x7ffc32c006a0>

I think what you're looking for is meta classes:

>>> import six
>>> 
>>> class CamelidMeta(type):
...   def __new__(cls, name, bases, attrs):
...     if name == 'Alpaca':
...       print("Yay! Alpacas!")
...     return super(CamelidMeta, cls).__new__(cls, name, bases, attrs)
... 
>>> class Alpaca(six.with_metaclass(CamelidMeta)):
...   pass
... 
Yay! Alpacas!
>>> Alpaca()
<__main__.Alpaca object at 0x7f10d0468f28>
>>> Alpaca()
<__main__.Alpaca object at 0x7f10ce3c78d0>

from django-rest-framework-filters.

philipn avatar philipn commented on August 27, 2024

I've merged some of the changes I've been making over the past little while, including a project rename ("django-rest-framework-chain" -> "django-rest-framework-filters"). I've also implemented a new DjangoFilterBackend which will cache repeated invocations of the same FilterSet, which is what I thought DRF was doing in the first place. This should, hopefully, resolve the performance issues you saw, as the filters' iteration will only happen the first time the FilterSet is called. This passes all the tests here, as well as all of the API tests in the @localwiki project.

Note: I tried a few different approaches to resolving this, and I couldn't get the metaclass-based approach working in a reasonable timeframe. If anyone would like to pick back up the metaclass-based fix for this issue, it can be found in https://github.com/philipn/django-rest-framework-chain/tree/metaclass_for_issue_8 (metaclass_for_issue_8 branch). The tests there pass, but the tests in our project failed (and in @JockeTF's project).

from django-rest-framework-filters.

JockeTF avatar JockeTF commented on August 27, 2024

Nope!

The new changes to master actually made our tests around one second slower, requiring 15 instead of 14 seconds. I will look into solving this issue with meta classes, but it might take a little while before I get started on this.

Thanks for the help so far!

from django-rest-framework-filters.

philipn avatar philipn commented on August 27, 2024

@JockeTF Can you provide information on how your tests are being run? Are you using the new DjangoFilterBackend? Depending on how your tests are written, the filter backend may be re-generated between tests, or perhaps not used. On our setup (@localwiki), the new DjangoFilterBackend only calls the filterset iteration code on the first call, and subsequent requests (per thread, of course) use the same pre-generated filterset.

from django-rest-framework-filters.

JockeTF avatar JockeTF commented on August 27, 2024

You are correct!

I wasn't using your new DjangoFilterBackend. This reduces our total testing time from 14 to 13 seconds. Thank you very much for that!

I managed to make some further improvements on master by adding the following to FilterSet:

    def get_requested_filters(self):
        """
        Returns the filters used in the current request.
        """
        requested_filters = OrderedDict()

        for name, filter_ in six.iteritems(self.filters):
            if name in self.data:
                requested_filters[name] = filter_

        return requested_filters

    @property
    def qs(self):
        available_filters = self.filters
        requested_filters = self.get_requested_filters()

        self.filters = requested_filters
        qs = super(FilterSet, self).qs
        self.filters = available_filters

        return qs

With these changes I reduced our total testing time from 13 to 11 seconds.

I haven't looked into the reason for this, but it seems the performance of the qs property depends on the number of items present in self.filter. Filtering the filters before reading the property reduces the time needed to do the filtering significantly.

Perhaps this is something that should be fixed in another package.

from django-rest-framework-filters.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.