GithubHelp home page GithubHelp logo

christopherrabotin / bungiesearch Goto Github PK

View Code? Open in Web Editor NEW
68.0 10.0 20.0 285 KB

UNMAINTAINED CODE -- Elasticsearch-dsl-py django wrapper with mapping generator

License: BSD 3-Clause "New" or "Revised" License

Python 97.18% Shell 2.82%

bungiesearch's Introduction

This package is no longer maintained. You may want to check out the elasticsearch-dsl-py or django-haystack.

Build Status Coverage Status

Bungiesearch is a Django wrapper for elasticsearch-dsl-py. It inherits from elasticsearch-dsl-py's Search class, so all the fabulous features developed by the elasticsearch-dsl-py team are also available in Bungiesearch. In addition, just like Search, Bungiesearch is a lazy searching class (and iterable), meaning you can call functions in a row, or do something like the following.

lazy = Article.objects.search.query('match', _all='Description')
print len(lazy) # Prints the number of hits by only fetching the number of items.
for item in lazy[5:10]:
    print item
  • Core Python friendly
    • Iteration ([x for x in lazy_search])
    • Get items (lazy_search[10])
    • Number of hits via len (len(lazy_search))
  • Index management
    • Creating and deleting an index.
    • Creating, updating and deleting doctypes and their mappings.
    • Update index doctypes.
  • Django Model Mapping
    • Very easy mapping (no lies).
    • Automatic model mapping (and supports undefined models by returning a Result instance of elasticsearch-dsl-py).
    • Efficient database fetching:
      • One fetch for all items of a given model.
      • Fetches only desired fields.
  • Django Manager
    • Easy model integration: MyModel.search.query("match", _all="something to search").
    • Search aliases (search shortcuts with as many parameters as wanted): Tweet.object.bungie_title_search("bungie") or Article.object.bungie_title_search("bungie"), where bungie_title_search is uniquely defined.
  • Django signals
    • Connect to post save and pre delete signals for the elasticsearch index to correctly reflect the database (almost) at all times.
  • Requirements
    • Django >= 1.8
    • Python 2.7, 3.4, 3.5

See section "Full example" at the bottom of page to see the code needed to perform these following examples. ### Query a word (or list thereof) on a managed model.

Article.objects.search.query('match', _all='Description')

Use a search alias on a model's manager.

Article.objects.bsearch_title_search('title')

Use a search alias on a bungiesearch instance.

Article.objects.search.bsearch_title_search('title').bsearch_titlefilter('filter this title')

Iterate over search results

# Will print the Django model instance.
for result in Article.objects.search.query('match', _all='Description'):
    print result

Fetch a single item

Article.objects.search.query('match', _all='Description')[0]

Get the number of returned items

print len(Article.objects.search.query('match', _all='Description'))

Deferred model instantiation

# Will print the Django model instance's primary key. Will only fetch the `pk` field from the database.
for result in Article.objects.search.query('match', _all='Description').only('pk'):
    print result.pk

Elasticsearch limited field fetching

# Will print the Django model instance. However, elasticsearch's response only has the `_id` field.
for result in Article.objects.search.query('match', _all='Description').fields('_id'):
    print result

Get a specific number of items with an offset.

This is actually elasticseach-dsl-py functionality, but it's demonstrated here because we can iterate over the results via Bungiesearch.

for item in Article.objects.bsearch_title_search('title').only('pk').fields('_id')[5:7]:
    print item

Lazy objects

lazy = Article.objects.bsearch_title_search('title')
print len(lazy)
for item in lazy.filter('range', effective_date={'lte': '2014-09-22'}):
    print item

Unless noted otherwise, each step is required.

The easiest way is to install the package from PyPi:

pip install bungiesearch

Note: Check your version of Django after installing bungiesearch. It was reported to me directly that installing bungiesearch may upgrade your version of Django, although I haven't been able to confirm that myself. Bungiesearch depends on Django 1.7 and above.

Updating your Django models

Note: this part is only needed if you want to be able to use search aliases, which allow you to define shortcuts to complex queries, available directly from your Django models. I think it's extremely practical.

  1. Open your models.py file.
  2. Add the bungiesearch manager import: from bungiesearch.managers import BungiesearchManager
  3. Find the model, or models, you wish to index on Elasticsearch and set them to be managed by Bungiesearch by adding the objects field to them, as such: objects = BungiesearchManager(). You should now have a Django model similar to this.

Creating bungiesearch search indexes

The search indexes define how bungiesearch should serialize each of the model's objects. It effectively defines how your object is serialized and how the ES index should be structured. These are referred to as ModelIndexes.

A good practice here is to have all the bungiesearch stuff in its own package. For example, for the section of the Sparrho platform that uses Django, we have a package called search where we define the search indexes, and a subpackage called aliases which has the many aliases we use (more on that latter).

  1. Create a subclass of ModelIndex, which you can import from from bungiesearch.indices import ModelIndex, in a new module preferably.
  2. In this class, define a class called Meta: it will hold meta information of this search index for bungiesearch's internal working.
  3. Import the Django model you want to index (from your models file) and, in the Meta class, define a field called model, which must be set to the model you want indexed.
  4. By default, bungiesearch will index every field of your model. This may not always be desired, so you can define which fields must be excluded in this Meta class, via the exclude field.
  5. There are plenty of options, so definitely have a read through the documentation for ModelIndex.

Here's an example of a search index. There can be many such definitions in a file.

Django settings

This is the final required step. Here's the full documentation of this step.

  1. Open your settings file and add a BUNGIESEARCH variable, which must be a dictionary.
  2. Define URLS as a list of URLs (which can contain only one) of your ES servers.
  3. Define the INDICES key as a dictionary where the key is the name of the index on ES that you want, and the value is the full Python path to the module which has all the ModelIndex classes for to be indexed on that index name.
  4. Set ALIASES to an empty dictionary (until you define any search aliases).
  5. You can keep other values as their defaults.

Create the ES indexes

From your shell, in the Django environment, run the following:

python manage.py search_index --create

Run the following which will take each of the objects in your model, serialize them, and add them to the elasticsearch index.

python manage.py search_index --update

Note: With additional parameters, you can limit the number of documents to be indexed, as well as set conditions on whether they should be indexed based on updated time for example.

You can now open your elasticsearch dashboard, such as Elastic HQ, and see that your index is created with the appropriate mapping and has items that are indexed.

This example is from the test folder. It may be partially out-dated, so please refer to the test folder for the latest version.

  1. In your models.py file (or your managers.py), import bungiesearch and use it as a model manager.
  2. Define one or more ModelIndex subclasses which define the mapping between your Django model and elasticsearch.
  3. (Optional) Define SearchAlias subclasses which make it trivial to call complex elasticsearch-dsl-py functions.
  4. Add a BUNGIESEARCH variable in your Django settings, which must contain the elasticsearch URL(s), the modules for the indices, the modules for the search aliases and the signal definitions.

Here's the code which is applicable to the previous examples. ### Django Model

from django.db import models
from bungiesearch.managers import BungiesearchManager

class Article(models.Model):
    title = models.TextField(db_index=True)
    authors = models.TextField(blank=True)
    description = models.TextField(blank=True)
    link = models.URLField(max_length=510, unique=True, db_index=True)
    published = models.DateTimeField(null=True)
    created = models.DateTimeField(auto_now_add=True)
    updated = models.DateTimeField(null=True)
    tweet_count = models.IntegerField()
    raw = models.BinaryField(null=True)
    source_hash = models.BigIntegerField(null=True)
    missing_data = models.CharField(blank=True, max_length=255)
    positive_feedback = models.PositiveIntegerField(null=True, blank=True, default=0)
    negative_feedback = models.PositiveIntegerField(null=True, blank=True, default=0)
    popularity_index = models.IntegerField(default=0)

    objects = BungiesearchManager()

    class Meta:
        app_label = 'core'

ModelIndex

The following ModelIndex will generate a mapping containing all fields from Article, minus those defined in ArticleIndex.Meta.exclude. When the mapping is generated, each field will the most appropriate elasticsearch core type, with default attributes (as defined in bungiesearch.fields).

These default attributes can be overwritten with ArticleIndex.Meta.hotfixes: each dictionary key must be field defined either in the model or in the ModelIndex subclass (ArticleIndex in this case).

from core.models import Article
from bungiesearch.fields import DateField, StringField
from bungiesearch.indices import ModelIndex


class ArticleIndex(ModelIndex):
    effectived_date = DateField(eval_as='obj.created if obj.created and obj.published > obj.created else obj.published')
    meta_data = StringField(eval_as='" ".join([fld for fld in [obj.link, str(obj.tweet_count), obj.raw] if fld])')

    class Meta:
        model = Article
        exclude = ('raw', 'missing_data', 'negative_feedback', 'positive_feedback', 'popularity_index', 'source_hash')
        hotfixes = {'updated': {'null_value': '2013-07-01'},
                    'title': {'boost': 1.75},
                    'description': {'boost': 1.35},
                    'full_text': {'boost': 1.125}}

SearchAlias

Defines a search alias for one or more models (in this case only for core.models.Article).

from core.models import Article
from bungiesearch.aliases import SearchAlias


class SearchTitle(SearchAlias):
    def alias_for(self, title):
        return self.search_instance.query('match', title=title)

    class Meta:
        models = (Article,)
        alias_name = 'title_search' # This is optional. If none is provided, the name will be the class name in lower case.

class InvalidAlias(SearchAlias):
    def alias_for_does_not_exist(self, title):
        return title

    class Meta:
        models = (Article,)

Django settings

BUNGIESEARCH = {
                'URLS': [os.getenv('ELASTIC_SEARCH_URL')],
                'INDICES': {'bungiesearch_demo': 'core.search_indices'},
                'ALIASES': {'bsearch': 'myproject.search_aliases'},
                'SIGNALS': {'BUFFER_SIZE': 1}  # uses BungieSignalProcessor
                }

A ModelIndex defines mapping and object extraction for indexing of a given Django model.

Any Django model to be managed by bungiesearch must have a defined ModelIndex subclass. This subclass must contain a subclass called Meta which must have a model attribute (sets the model which it represents).

Class attributes

As detailed below, the doc type mapping will contain fields from the model it related to. However, one may often need to index fields which correspond to either a concatenation of fields of the model or some logical operation.

Bungiesearch makes this very easy: simply define a class attribute as whichever core type, and set to the eval_as constructor parameter to a one line Python statement. The object is referenced as obj (not self nor object, just obj).

Example

This is a partial example as the Meta subclass is not defined, yet mandatory (cf. below).

from bungiesearch.fields import DateField, StringField
from bungiesearch.indices import ModelIndex

class ArticleIndex(ModelIndex):
    effective_date = DateField(eval_as='obj.created if obj.created and obj.published > obj.created else obj.published')
    meta_data = StringField(eval_as='" ".join([fld for fld in [obj.link, str(obj.tweet_count), obj.raw] if fld])')

Here, both effective_date and meta_data will be part of the doc type mapping, but won't be reversed mapped since those fields do not exist in the model.

This can also be used to index foreign keys:

some_field_name = StringField(eval_as='",".join([item for item in obj.some_foreign_relation.values_list("some_field", flat=True)]) if obj.some_foreign_relation else ""')

Class methods

matches_indexing_condition

Override this function to specify whether an item should be indexed or not. This is useful when defining multiple indices (and ModelIndex classes) for a given model. This method's signature and super class code is as follows, and allows indexing of all items.

def matches_indexing_condition(self, item):
    return True

For example, if a given elasticsearch index should contain only item whose title starts with "Awesome", then this method can be overridden as follows.

def matches_indexing_condition(self, item):
    return item.title.startswith("Awesome")

Meta subclass attributes

Note: in the following, any variable defined a being a list could also be a tuple. ##### model Required: defines the Django model for which this ModelIndex is applicable.

fields

Optional: list of fields (or columns) which must be fetched when serializing the object for elasticsearch, or when reverse mapping the object from elasticsearch back to a Django Model instance. By default, all fields will be fetched. Setting this will restrict which fields can be fetched and may lead to errors when serializing the object. It is recommended to use the exclude attribute instead (cf. below).

exclude

Optional: list of fields (or columns) which must not be fetched when serializing or deserializing the object.

hotfixes

Optional: a dictionary whose keys are index fields and whose values are dictionaries which define core type attributes. By default, there aren't any special settings, apart for String fields, where the analyzer is set to `snowball <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-snowball-analyzer.html>`__ ({'analyzer': 'snowball'}).

additional_fields

Optional: additional fields to fetch for mapping, may it be for eval_as fields or when returning the object from the database.

id_field

Optional: the model field to use as a unique ID for elasticsearch's metadata _id. Defaults to id (also called `pk <https://docs.djangoproject.com/en/dev/topics/db/models/#automatic-primary-key-fields>`__).

updated_field

Optional: set the model's field which can be filtered on dates in order to find when objects have been updated. Note, this is mandatory to use --start and/or --end when updating index (with search_index --update).

optimize_queries

Optional: set to True to make efficient queries when automatically mapping to database objects. This will always restrict fetching to the fields set in fields and in additional_fields. Note: You can also perform an optimal database query with .only('__model'), which will use the same fields as optimize_queries, or .only('__fields'), which will use the fields provided in the .fields() call.

indexing_query

Optional: set to a QuerySet instance to specify the query used when the search_index command is ran to index. This does not affect how each piece of content is indexed.

default

Enables support for a given model to be indexed on several elasticsearch indices. Set to False on all but the default index. Note: if all managed models are set with default=False then Bungiesearch will fail to find and index that model.

Example

Indexes all objects of Article, as long as their updated datetime is less than 21 October 2015 04:29.

from core.models import Article
from bungiesearch.indices import ModelIndex
from datetime import datetime

class ArticleIndex(ModelIndex):

    def matches_indexing_condition(self, item):
        return item.updated < datetime.datetime(2015, 10, 21, 4, 29)

    class Meta:
        model = Article
        id_field = 'id' # That's actually the default value, so it's not really needed.
        exclude = ('raw', 'missing_data', 'negative_feedback', 'positive_feedback', 'popularity_index', 'source_hash')
        hotfixes = {'updated': {'null_value': '2013-07-01'},
                    'title': {'boost': 1.75},
                    'description': {'boost': 1.35},
                    'full_text': {'boost': 1.125}}
        optimize_queries = True
        indexing_query = Article.objects.defer(*exclude).select_related().all().prefetch_related('tags')

A SearchAlias define search shortcuts (somewhat similar to Django managers). Often times, a given search will be used in multiple parts of the code. SearchAliases allow you define those queries, filters, or any bungiesearch/elasticsearch-dsl-py calls as an alias.

A search alias is either applicable to a list (or tuple) of managed models, or to any bungiesearch instance. It's very simple, so here's an example which is detailed right below.

Example

The most simple implementation of a SearchAlias is as follows. This search alias can be called via Article.objects.bungie_title (or Article.objects.search.bungie_title), supposing that the namespace is set to None in the settings (cf. below).

Definition
from bungiesearch.aliases import SearchAlias

class Title(SearchAlias):
    def alias_for(self, title):
        return self.search_instance.query('match', title=title)
Usage
Article.objects.bungie_title('title')

Method overwrite

Any implementation needs to inherit from bungiesearch.aliases.SearchAlias and overwrite alias_for. You can set as many or as little parameters as you want for that function (since bungiesearch only return the pointer to that function without actually calling it).

Since each managed model has its own doc type, self.search_instance is a bungiesearch instance set to search the specific doctype.

Meta subclass attributes

Although not mandatory, the Meta subclass enabled custom naming and model restrictions for a search alias.

models

Optional: list (or tuple) of Django models which are allowed to use this search alias. If a model which is not allowed to use this SearchAlias tries it, a ValueError will be raised.

alias_name

Optional: A string corresponding the suffix name of this search alias. Defaults to the lower case class name.

WARNING: As explained in the "Settings" section below, all search aliases in a given module share the prefix (or namespace). This is to prevent aliases from accidently overwriting Django manager function (e.g. update or get). In other words, if you define the alias_name to test, then it must be called as model_obj.objects.$prefix$_test where $prefix$ is the prefix defined in the settings. This prefix is also applicable to search aliases which are available via bungiesearch instances directly. Hence, one can define in one module search utilities (e.g. regex and range) and define model specific aliases (e.g. title) in another module, and use both in conjunction as such: Article.objects.search.bungie_title('search title').utils_range(field='created', gte='2014-05-20', as_query=True). These aliases can be concatenated ad vitam aeternam.

Sophisticated example

This example shows that we can have some fun with search aliases. In this case, we define a Range alias which is applicable to any field on any model.

class Range(SearchAlias):
    def alias_for(self, field, gte=None, lte=None, boost=None, as_query=False):
        body = {field: {}}
        if gte:
            body[field]['gte'] = gte
        if lte:
            body[field]['lte'] = lte
        if boost:
            if not as_query:
                logging.warning('Boost is not applicable to search alias Range when not used as a query.')
            else:
                body[field]['boost'] = boost
        if as_query:
            return self.search_instance.query({'range': body})
        return self.search_instance.filter({'range': body})

We can use it as such Article.objects.bungie_range(field='created', gte='2014-05-20', as_query=True).

Add 'bungiesearch' to INSTALLED_APPS.

You must define BUNGIESEARCH in your Django settings in order for bungiesearch to know elasticsearch URL(s) and which index name contains mappings for each ModelIndex.

BUNGIESEARCH = {
                'URLS': ['localhost'], # No leading http:// or the elasticsearch client will complain.
                'INDICES': {'main_index': 'myproject.myapp.myindices'} # Must be a module path.
                'ALIASES': {'bsearch': 'myproject.search_aliases'},
                'SIGNALS': {'BUFFER_SIZE': 1},
                'TIMEOUT': 5
                }

URLS

Required: must be a list of URLs which host elasticsearch instance(s). This is directly sent to elasticsearch-dsl-py, so any issue with multiple URLs should be refered to them.

INDICES

Required: must be a dictionary where each key is the name of an elasticsearch index and each value is a path to a Python module containing classes which inherit from bungiesearch.indices.ModelIndex (cf. below).

ALIASES

Optional: a dictionary whose key is the alias namespace and whose value is the Python module containing classes which inherit from bungiesearch.aliases.SearchAlias. If the namespace is None, then the alias will be named bungie. If the namespace is an empty string, there will be no alias namespace. The provided namespace will be appended by an underscore. In the example above, each search alias defined in myproject.search_aliases will be referenced as $ModelObj$.objects.bsearch_$alias$, where $ModelObj$ is a Django model and $alias$ is the name of the search alias.

The purpose is to not accidently overwrite Django's default manager functions with search aliases.

SIGNALS

Optional: if it exists, it must be a dictionary (even empty), and will connect to the post save and pre delete model functions of all models using bungiesearch.managers.BungiesearchManager as a manager. One may also define a signal processor class for more custom functionality by placing the string value of the module path under a key called SIGNAL_CLASS in the dictionary value of SIGNALS and defining setup and teardown methods, which take model as the only parameter. These methods connect and disconnect the signal processing class to django signals (signals are connected to each model which uses a BungiesearchManager).

If SIGNALS is not defined in the settings, none of the models managed by BungiesearchManager will automatically update the index when a new item is created or deleted.

BUFFER_SIZE

Optional: an integer representing the number of items to buffer before making a bulk index update, defaults to 100.

WARNING: if your application is shut down before the buffer is emptied, then any buffered instance will not be indexed on elasticsearch. Hence, a possibly better implementation is wrapping post_save_connector and pre_delete_connector from bungiesearch.signals in a celery task. It is not implemented as such here in order to not require celery.

TIMEOUT

Optional: Elasticsearch connection timeout in seconds. Defaults to 5.

The easiest way to run the tests is to install all dev dependencies using ./setup.sh then run ./test.sh

All Bungiesearch tests are in tests/core/test_bungiesearch.py. You can run the tests by creating a Python virtual environment, installing the requirements from requirements.txt, installing the package (pip install .) and running python tests/manage.py test. Make sure to update tests/settings.py to use your own elasticsearch URLs, or update the ELASTIC_SEARCH_URL environment variable.

bungiesearch's People

Contributors

christopherrabotin avatar diwu1989 avatar folcon avatar joshstegmaier avatar meninoebom avatar mgeist avatar moemen avatar nullsoldier avatar terite avatar vannitotaro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bungiesearch's Issues

Bug in signal processing - self.model is None

self.model is None here. It seems like this should not be the case, and may have slipped through the cracks. I'm going to try to make amends to this - Django seems to set self.model to None by default.

/Users/betterworks/bungiesearch/tests/core/models.py(5)<module>()

      3 

      4 

----> 5 class Article(models.Model):

      6     title = models.TextField(db_index=True)

      7     authors = models.TextField(blank=True)

/Users/betterworks/bungiesearch/tests/core/models.py(22)Article()

     20     popularity_index = models.IntegerField(default=0)

     21 

---> 22     objects = BungiesearchManager()

     23 

     24     class Meta:

> /Users/betterworks/bungiesearch/bungiesearch/managers.py(35)__init__()

     33         import ipdb; ipdb.set_trace()

     34         settings = Bungiesearch.BUNGIE

---> 35         if 'SIGNALS' in settings:

     36             self.signal_processor = get_signal_processor()

     37             self.signal_processor.setup(self.model)

ipdb> self.model

ipdb> type(self.model)

<type 'NoneType'>

How to write filtered queries

I am trying to filter a query for date range and related field values. I assume that the syntax for BungieSearch filtered queries syntax would be derived from elasticsearch-dsl-py. However from reading the elasticsearch-dsl-py docs I am still not able to figure out how to construct filtered queries. Any advice or nudge in toward the right documentation is welcome. Thanks in advance.

Merge in elasticsearch-dsl persistance model

A couple of weeks ago, elasticsearch-dsl added a persistence model, which can be used to define indices and persist data. Bungiesearch also has this functionality, and so much more, including search aliases and django integration.

The idea of this issue is to grab ES-dsl's persistence, to stay true to ES-dsl but add django integration.

Python 3 support

Elasticsearch-dsl-py supports Python 3. However, testing bungiesearch on Python 3 fails with a Runtime Error.

The code in release 1.1.0 will have some initial steps towards Python 3 support, but as the build shows, the support isn't here yet.

Update index with a start and ending date

This will enable devs to rely on the bulk indexing yet still create a celery task for updating latest documents in case there was a server instance failure before buffer was full and indexed.

Allow search aliases to be applicable to all managed models

This will allow very broad search aliases to be created, such as the following:

class Range(SearchAlias):
    def alias_for(self, field, gte=None, lte=None, boost=None, as_query=False):
        body = {field: {}}
        if gte:
            body[field]['gte'] = gte
        if lte:
            body[field]['lte'] = lte
        if boost:
            if not as_query:
                logging.warning('Boost is not applicable to search alias Range when not used as a query.')
            else:
                body[field]['boost'] = boost
        if as_query:
            return self.search_instance.query({'range': body})
        return self.search_instance.filter({'range': body})

Signal connection

Connect models to the post_save signal. If possible, define a bulk size of items to index all at once.
Warning: if bulk updating, see if we can connect to the pre shutdown signal (if such exist) in order to index buffered items.

Signals raise exceptions on unmanaged models

File "/home/rof/.virtualenv/local/lib/python2.7/site-packages/django/db/models/base.py", line 664, in save_base update_fields=update_fields, raw=raw, using=using)
File "/home/rof/.virtualenv/local/lib/python2.7/site-packages/django/dispatch/dispatcher.py", line 170, in send response = receiver(signal=self, sender=sender, **named) 
File "/home/rof/.virtualenv/src/bungiesearch/bungiesearch/signals.py", line 16, in post_save_connector update_index(__items_to_be_indexed__[sender], sender.__name__, buffer_size)
File "/home/rof/.virtualenv/src/bungiesearch/bungiesearch/utils.py", line 19, in update_index index_name = src.get_index(model_name) 
File "/home/rof/.virtualenv/src/bungiesearch/bungiesearch/__init__.py", line 101, in get_index raise KeyError('Could not find any index defined for {}. Is the model in one of the model index modules of BUNGIESEARCH["INDICES"]?'.format(model)) 
KeyError: 'Could not find any index defined for Session. Is the model in one of the model index modules of BUNGIESEARCH["INDICES"]?'

Delay database fetching on demand

One might want to do multiple searches and concatenate the elastic search results, or process them in some way (e.g. sort them or filter them by score) and only then map them to database items.

A possible solution would be extracting mapping from execute and moving it do a class method. It would accept a list (or tuple) of raw results, and then map them.

Hence, one could do the following (a better example is multiple must/should queries on the same data set):

items = []
items += Article.object.search.query("match", title="Some title")[:20:True]
items += Article.object.search.query("match", description="a description")[:20:True]
items = [item for item in items if item.score > 0.75]
Bungiesearch.map_raw_results(items)

Another solution would be subclassing list in Bungiesearch to use it as such:

items = Results()
items += Article.object.search.query("match", title="Some title")[:20:True]
items += Article.object.search.query("match", description="a description")[:20:True]
items = [item for item in items if item.score > 0.75]
items[:10] # Executes the mapping

Aliases should be able to work with Bungiesearch instances

Currently hook_alias is a class method. However, this limits search aliases to models managed by bungiesearch. It should be relatively trivial to allow aliases to work for bungiesearch instances. That would be useful when one wants to query several doc types at once. However, that could prevent automatic object mapping.

The major advantage is being able to combine queries and aliases (supposing managers still work), as such:
Article.objects.search.query('match', field='value').regex('date', gte='2014-09-23')

Will require the following:

  • Convert hook alias to an instance method
  • Remove model verification if used out of a model (self.model = None on instantiation may help).

Support elasticsearch-dsl-py 0.0.4

Version 0.0.4 of elasticsearch-dsl-py fails. It seems like the Result object has changed and no longer contains the _meta attribute.

Temporary solution: use elasticsearch-dsl-py version 0.0.3dev.

======================================================================
ERROR: test_concat_queries (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 234, in test_concat_queries
    items = Article.objects.bsearch_title_search('title')[::False] + NoUpdatedField.objects.search.query('match', title='My title')[::False]
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 341, in __getitem__
    results = super(Bungiesearch, self).__getitem__(key).execute()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
    self.map_results()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
    self.results = Bungiesearch.map_raw_results(self.raw_results, self)
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
    model_name = result._meta.doc_type
  File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
    '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'

======================================================================
ERROR: test_fetch_item (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 76, in test_fetch_item
    self.assertEqual(Article.objects.search.query('match', _all='Description')[0], Article.objects.get(title='Title one'), 'Searching for "Description" did not return just the first Article.')
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 341, in __getitem__
    results = super(Bungiesearch, self).__getitem__(key).execute()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
    self.map_results()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
    self.results = Bungiesearch.map_raw_results(self.raw_results, self)
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
    model_name = result._meta.doc_type
  File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
    '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'

======================================================================
ERROR: test_iteration (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 92, in test_iteration
    self.assertTrue(all([result in db_items for result in lazy_search]), 'Searching for title "title" did not return all articles.')
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 312, in __iter__
    self.execute()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
    self.map_results()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
    self.results = Bungiesearch.map_raw_results(self.raw_results, self)
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
    model_name = result._meta.doc_type
  File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
    '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'

======================================================================
ERROR: test_optimal_queries (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 228, in test_optimal_queries
    src_item = NoUpdatedField.objects.search.query('match', title='My title')[0]
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 341, in __getitem__
    results = super(Bungiesearch, self).__getitem__(key).execute()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
    self.map_results()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
    self.results = Bungiesearch.map_raw_results(self.raw_results, self)
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
    model_name = result._meta.doc_type
  File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
    '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'

======================================================================
ERROR: test_post_save (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 178, in test_post_save
    self.assertNotEqual(find_three[0:1:True]._meta.index, find_three[1:2:True]._meta.index, 'Searching for "three" did not return items from different indices.')
  File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
    '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'

======================================================================
ERROR: test_search_aliases (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 144, in test_search_aliases
    self.assertTrue(all([result in db_items for result in title_alias]), 'Alias searching for title "title" did not return all articles.')
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 312, in __iter__
    self.execute()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 286, in execute
    self.map_results()
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 295, in map_results
    self.results = Bungiesearch.map_raw_results(self.raw_results, self)
  File "/[redacted]/bungiesearch/bungiesearch/__init__.py", line 171, in map_raw_results
    model_name = result._meta.doc_type
  File "/home/chris/.virtualenvs/bungiesearch/src/elasticsearch-dsl-master/elasticsearch_dsl/utils.py", line 109, in __getattr__
    '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Result' object has no attribute '_meta'

======================================================================
FAIL: test_raw_fetch (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 84, in test_raw_fetch
    self.assertTrue(hasattr(item, '_meta'), 'Fetching first raw results did not return an object with a _meta attribute.')
AssertionError: Fetching first raw results did not return an object with a _meta attribute.

======================================================================
FAIL: test_specify_index (tests.core.test_bungiesearch.ModelIndexTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/[redacted]/bungiesearch/tests/core/test_bungiesearch.py", line 266, in test_specify_index
    self.assertEqual(Article.objects.count(), Article.objects.search_index('bungiesearch_demo').count(), 'Indexed items on bungiesearch_demo for Article does not match number in database.')
AssertionError: Indexed items on bungiesearch_demo for Article does not match number in database.

----------------------------------------------------------------------
Ran 22 tests in 10.372s

FAILED (failures=2, errors=6)

Cannot request to fetch only the _id of the document

Description

When fields is called on a Bungiesearch instance and the only parameter is _id, then map_raw_results fails to properly fetch the correct information from the database. This is problematic when requesting elasticsearch to return exactly only the IDs, or when the id field (without an underscore) is not provided when calling fields.

Example

In [3]: some_content = RawArticle.objects.bungie_content()
In [4]: for item in some_content[5:10:True]:              
    print item
   ...:     
<Result(sparrho/RawArticle/2477511): {}>
<Result(sparrho/RawArticle/2477523): {}>
<Result(sparrho/RawArticle/2477528): {}>
<Result(sparrho/RawArticle/2477491): {}>
<Result(sparrho/RawArticle/2477530): {}>
In [5]: some_content = RawArticle.objects.search.fields(['_id'])
In [6]: for item in some_content[5:10:True]:                    
    print item
   ...:     
<Result(sparrho/RawArticle/2477484): {}>
<Result(sparrho/RawArticle/2477504): {}>
<Result(sparrho/RawArticle/2477509): {}>
<Result(sparrho/RawArticle/2477511): {}>
<Result(sparrho/RawArticle/2477523): {}>
In [7]: for item in some_content[5:10:False]:
    print item
   ...:     
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-7541076dcf2b> in <module>()
----> 1 for item in some_content[5:10:False]:
      2     print item
      3 
/home/chris/.virtualenvs/sparrho-dj17/src/bungiesearch/bungiesearch/__init__.pyc in __getitem__(self, key)
    339         else:
    340             single_item = True
--> 341         results = super(Bungiesearch, self).__getitem__(key).execute()
    342         if single_item:
    343             try:
/home/chris/.virtualenvs/sparrho-dj17/src/bungiesearch/bungiesearch/__init__.pyc in execute(self, return_results)
    284             self.results = self.raw_results
    285         else:
--> 286             self.map_results()
    287 
    288         if return_results:
/home/chris/.virtualenvs/sparrho-dj17/src/bungiesearch/bungiesearch/__init__.pyc in map_results(self)
    293         Maps raw results and store them.
    294         '''
--> 295         self.results = Bungiesearch.map_raw_results(self.raw_results, self)
    296 
    297     def only(self, *fields):
/home/chris/.virtualenvs/sparrho-dj17/src/bungiesearch/bungiesearch/__init__.pyc in map_raw_results(cls, raw_results, instance)
    174                 results[pos] = result
    175             else:
--> 176                 model_results['{}.{}'.format(result._meta.index, model_name)].append(result.id)
    177                 found_results['{1._meta.index}.{0}.{1.id}'.format(model_name, result)] = (pos, result._meta)
    178 
/home/chris/.virtualenvs/sparrho-dj17/lib/python2.7/site-packages/elasticsearch_dsl/utils.pyc in __getattr__(self, attr_name)
    104         except KeyError:
    105             raise AttributeError(
--> 106                 '%r object has no attribute %r' % (self.__class__.__name__, attr_name))
    107 
    108     def __getitem__(self, key):
AttributeError: 'Result' object has no attribute 'id'
In [8]: dir(item)
Out[8]: ['_meta']
In [9]: dir(item._meta)
Out[9]: ['doc_type', u'id', u'index', u'score']
In [10]: 

Time specific indexing

Date based indexing allows one to index items whose updated date is greater or lower than a provided date, and does not support time. This issue is raised to support time as well.

Indexing a None value may index it as a string

At least in some instances, it seems that indexing a None value will index it as None: the missing filter does not filter them out.

In [69]: Article.objects.bungie_allcontent().index('sparrho').filter('missing', field='created').sort('created').count()
Out[69]: 0
In [70]: article = Article.objects.bungie_allcontent().index('sparrho').sort('created')[:1:True]
In [71]: article.created is None
Out[71]: True

More explicit error messages

The following doesn't say which model and index failed. Check all exception messages and make them as explicit as possible.

ValueError: Cannot filter by date if the updated_field is not set in the index's Meta class.

How do you create indices for existing models

I am a relatively new Django developer, so forgive the question if it should be easily answered from docs or source.

Is there a management command to create ES indexes for models that already exist?

Query aliases via manager

Would it not be awesome if something like Article.objects.search.text_search(keywords) would automatically translate to a user provided query, eg, Article.object.search.filter("term", field="my_term").query("match", another_field=keywords)?

These definitions may be able to fit in the ModexIndex definition and rely solely on reflection within the manager (which is always fun).

Search alias name spaces

Instead of having a unique search alias, set aliases as a dictionary in order to be able to separate these aliases.

Automatic mapping to return elasticsearch meta information as well

Would it be valuable to return elasticsearch's meta information in the automatically mapped result? This could be achieved using a wrapping class which would extend from which ever model is to be mapped and add additional attributes. In that case, we must also take into consideration that devs may wish to get an item from a search and auto-mapping, change it, and save it. Attributes alien from the model may prevent this saving, so it may be needed to overwrite the save function and remove any additional attribute.

This would require some nice polymorphic code, and especially extensive testing.

Any thoughts @Folcon, @gtebbutt ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.