GithubHelp home page GithubHelp logo

dcramer / django-sphinx Goto Github PK

View Code? Open in Web Editor NEW
357.0 357.0 123.0 461 KB

A transparent layer for full-text search using Sphinx and Django

Home Page: http://groups.google.com/group/django-sphinx

License: BSD 3-Clause "New" or "Revised" License

Python 95.42% Makefile 2.30% Shell 2.28%

django-sphinx's Introduction

This project is no longer maintained

This is a layer that functions much like the Django ORM does except it works on top of the Sphinx (http://www.sphinxsearch.com) full-text search engine.

Please Note: You will need to create your own sphinx indexes and install sphinx on your server to use this app.

There will no longer be release packages available. Please use SVN to checkout the latest trunk version, as it should always be stable and current.

Installation

To install the latest stable version:

sudo easy_install django-sphinx

To install the latest development version (updated quite often):

git clone git://github.com/dcramer/django-sphinx.git  
cd django-sphinx
sudo python setup.py install

Note: You will need to install the sphinxapi.py package into your Python Path or use one of the included versions. To use the included version, you must specify the following in your settings.py file:

# Sphinx 0.9.9
SPHINX_API_VERSION = 0x116

# Sphinx 0.9.8
SPHINX_API_VERSION = 0x113

# Sphinx 0.9.7
SPHINX_API_VERSION = 0x107

Usage

The following is some example usage:

from djangosphinx.models import SphinxSearch

class MyModel(models.Model):
    search = SphinxSearch() # optional: defaults to db_table
    # If your index name does not match MyModel._meta.db_table
    # Note: You can only generate automatic configurations from the ./manage.py script
    # if your index name matches.
    search = SphinxSearch('index_name')

    # Or maybe we want to be more.. specific
    searchdelta = SphinxSearch(
        index='index_name delta_name',
        weights={
            'name': 100,
            'description': 10,
            'tags': 80,
        },
        mode='SPH_MATCH_ALL',
        rankmode='SPH_RANK_NONE',
    )

queryset = MyModel.search.query('query')
results1 = queryset.order_by('@weight', '@id', 'my_attribute')
results2 = queryset.filter(my_attribute=5)
results3 = queryset.filter(my_other_attribute=[5, 3,4])
results4 = queryset.exclude(my_attribute=5)[0:10]
results5 = queryset.count()

# as of 2.0 you can now access an attribute to get the weight and similar arguments
for result in results1:
    print result, result._sphinx
# you can also access a similar set of meta data on the queryset itself (once it's been sliced or executed in any way)
print results1._sphinx

Some additional methods: * count() * extra() (passed to the queryset) * all() (does nothing) * select_related() (passed to the queryset) * group_by(field, field, field) * set_options(index='', weights={}, weights=[], mode='SPH_MODE', rankmode='SPH_MATCH_')

The django-sphinx layer also supports some basic querying over multiple indexes. To use this you first need to understand the rules of a UNION. Your indexes must contain exactly the same fields. These fields must also include a content_type selection which should be the content_type id associated with that table (model).

You can then do something like this:

from djangosphinx.models import SphinxSearch

SphinxSearch('index1 index2 index3').query('hello')

This will return a list of all matches, ordered by weight, from all indexes. This performs one SQL query per index with matches in it, as Django's ORM does not support SQL UNION.

Config Generation

django-sphinx now includes a tool to create sample configuration for your models. It will generate both a source, and index configuration for a model class. You will still need to manually tweak the output, and insert it into your configuration, but it should aid in initial setup.

To use it:

from djangosphinx.utils import *

from myproject.myapp.models import MyModel

output = generate_config_for_model(MyModel)

print output

If you have multiple models which you wish to use the UNION searching:

model_classes = (ModelOne, ModelTwoWhichResemblesModelOne)

output = generate_config_for_models(model_classes)

You can also now output configuration from the command line:

./manage.py generate_sphinx_config <appname>

This will loop through all models in <appname> and attempt to find any with a SphinxSearch instance that is using the default index name (db_table).

Using the Config Generator

New in 2.2

django-sphinx now includes a simply python script to generate a config using your default template renderer. By default, we mean that if coffin is included in your INSTALLED_APPS, it uses it, otherwise it uses Django.

Two variables directly relate to the config generation:

# The base path for sphinx files. Sub directories will include data, log, and run. SPHINX_ROOT = '/var/sphinx-search/'

# Optional, defaults to 'conf/sphinx.html'. This should be configuration template. # See the included templates/sphinx.conf for an example. SPHINX_CONFIG_TEMPLATE = 'conf/sphinx.html'

Once done, your config can be passed via any sphinx command like so:

# Index your stuff DJANGO_SETTINGS_MODULE=myproject.settings indexer --config /path/to/djangosphinx/config.py --all --rotate

# Start the daemon DJANGO_SETTINGS_MODULE=myproject.settings searchd --config /path/to/djangosphinx/config.py

# Query the daemon DJANGO_SETTINGS_MODULE=myproject.settings search --config /path/to/djangosphinx/config.py my query

# Kill the daemon kill -9 $(cat /var/sphinx-search/run/searchd.pid)

For now, we recommend you setup some basic bash aliases or scripts to deal with this. This is just the first step in embedded config generation, so stay tuned!

  • Note: Make sure your PYTHON_PATH is setup properly!

Using Sphinx in Admin

Sphinx includes it's own ModelAdmin class to allow you to use it with Django's built-in admin app.

To use it, see the following example:

from djangosphinx.admin import SphinxModelAdmin

class MyAdmin(SphinxModelAdmin):
    index = 'my_index_name' # defaults to Model._meta.db_table
    weights = {'field': 100}

Limitations? You know it.

  • Only shows your max sphinx results (defaults to 1000)
  • Filters currently don't work.
  • This is a huge hack, so it may or may not continue working when Django updates.

Frequent Questions

How do I run multiple copies of Sphinx using django-sphinx?

The easiest way is to just run a different SPHINX_PORT setting in your settings.py. If you are using the above config generation, just modify the PORT, and start up the daemon

Resources

django-sphinx's People

Contributors

dcramer avatar mwrshl avatar wojciechpolak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

django-sphinx's Issues

django-sphinx installs django?

I am using the trunk version of django-sphinx on the development version of django, and looking through the install log for django-sphinx, there is some strange stuff.

I installed with:
sudo python setup.py install

Which I suppose handles dependencies. In my case, it installed django, even though I already had it installed:
Processing dependencies for django-sphinx==2.2.3
Searching for django
Reading http://pypi.python.org/simple/django/
Reading http://www.djangoproject.com/
Reading http://www.djangoproject.com/download/1.0.1-beta-1/tarball/
Best match: Django 1.1.1
Downloading http://media.djangoproject.com/releases/1.1.1/Django-1.1.1.tar.gz
Processing Django-1.1.1.tar.gz

This is quite frustrating since, I really don't care to learn how the install works, but now I have to look through it and see how I can uninstall django.

No idea if this is the fault of django-sphinx, but it's very frustrating.

charset_type = utf-8

by default django uses utf8, so maybe django-sphinx should use it too?
charset_type = utf-8
I have problems with it when trying to use search in russian language.

Assert Errors on Compile

When I compile the latest from easy_install, I always get this:

c:\documents and settings\tlwhit2\my documents\aptana studio workspace\podiobooks\eggs\django_sphinx-2.1.2-py2.6.egg\djangosphinx\models.py:313: SyntaxWarning: assertion is always true, perhaps remove parentheses?
assert(sphinxapi.VER_COMMAND_SEARCH >= 0x113, "You must upgrade sphinxapi to version 0.98 to use Geo Anchoring.")
c:\documents and settings\tlwhit2\my documents\aptana studio workspace\podiobooks\eggs\django_sphinx-2.1.2-py2.6.egg\djangosphinx\models.py:656: SyntaxWarning: assertion is always true, perhaps remove parentheses?
assert(sphinxapi.VER_COMMAND_SEARCH >= 0x113, "You must upgrade sphinxapi to version 0.98 to use UpdateAttributes.")

Not sure if there is a better way to do that to avoid the syntax warning?

Thanks for django-sphinx!

Incorrect handling of the unicode queries

When searching like this:
Body.search.query(u'привет')

There're always zero results, while command-line search returns hundreds. This is due to double (or even triple) encoding in utf-8 done somewhere in the guts of django-sphinx/sphinxapi.

There're instances of pointless code like unicode(string).encode('utf-8'). The problem is that if string is already a unicode object, this code will create a unicode object containing its utf-8 representation and encode it using utf-8 again thus creating garbage. I've fixed this place in code but the string is sill double-encoded somewhere. :(

This code is pointless anyway because even if it would work - it would be a noop - take a bytestring, convert to unicode, convert to bytestring again. But instead of a useless noop it makes garbage of unicode input.

Typo in new mode kwargs include

Line 270 in models.py:

kwargs['mode'] = getattr(sphinxapi, kwargs.get('mode', 'SPH_MATCH_ALL'))

instead of

kwargs['mode'] = getattr(sphinxapi, kwargs.get('mode'), 'SPH_MATCH_ALL')

I was getting:

getattr(): attribute name must be string

because there was no default value being supplied...

Cheers and thanks for all your work on this!

Tim

Svn repo on google code should be killed with a redirect

Almost all django-sphinx installation notes say to use svn and the google code repo to download django-sphinx, leading many people to download it that way, only later to realize git and github are the way to go.

This can be easily avoided, or at least fixed a bit by simply creating a README.txt, put it in the google code repo as the latest revision, and delete everything else. In the README can be a simple snippet that says:

"This project has moved. Please seek out the new code at http://github.com/dcramer/django-sphinx"

From there, just:
svn rm *
svn add README.txt
svn ci -m "Nuking repo and redirecting people to github."

Something like that anyway. It would at least be better than having people install an old version of django-sphinx.

Django-sphinx returns not more than 20 resuts

QuerySet returned by sphinx may report more than 20 items in it but enumeration will return only first 20.

qset = Myclass.sphinx_search.query(q)
print qset.count() #prints more than 20

the enumeration loop stops after 20 iteration

for e in qset:
do something

This question was also asked on stackoverflow:
http://stackoverflow.com/questions/2671459/why-does-django-sphinx-only-output-20-results-how-can-i-get-the-rest

A solution proposed is to evaluate the query set:
for e in qset[0:qset.count()]:
do something

This works fine for small sets but could have a large memory overhead because all items might need to be loaded in memory. In contrast, iterating over a QuerySet will load objects only as you need them.

It would be better to make the first approach work.

Passages are displayed improperly

Currently the words variable for BuildExcerpts is built from words returned by Query command. These words are already stemmed and accompanied with some statistical information like docs and hits. Highlighting fails for some queries.

For example try recovery keyword, it's not highlighted since Query returns the stemmed version (recoveri) (I think this is the result of the soundex stemmer).

Anyways when you use a query string as a words variable for BuildExcerpts all's fine with highlighting. BuildExcerpts do the stemming and we don't use statistics on highlighting so I don't see the problem of using the initial query string for words.

I think this is subject to fix the line 569 of djangosphinx/models.py:

words = ' '.join([w['word'] for w in results['words']])

Replace this to:

words = self._query

Thoughts?

Accept strings in group_by

I think it should be possible to set the group method in group_by by passing a string.

Right now, I'm using:
import djangosphinx.apis.current as sphinxapi
query(q).group_by('ch_id', sphinxapi.SPH_GROUPBY_ATTR, '@weight desc')

Goal:
query(q).group_by('ch_id', 'SPH_GROUPBY_ATTR', '@weight desc')

Sorting fails on date fields older than 1970.

Using the order_by() construct, I get properly ordered results until 1970. I assume this has to do with the epoch time, and the way it is used by django-sphinx, but I haven't had a chance to find or fix this problem yet.

This is a major bug though, because of the way it affects results that may be sorted by date, which is a common way to use sphinx.

Comparison operations with SphinxProxy objects violate symmetry

The current implementation of eq, cmp in SphinxProxy violates symmetry. For instance, suppose you have

a = Object()
b = SphinxProxy( a )

You have b==a return True, but a==b return False, and even b==b return False.

This makes it awkward to use SphinxProxy object directly in further queries, such as

get_object_or_404( id = b )

will fail because it is trying to use the eq operator treating b as the second argument.

Any way to improve this behavior?

`djangosphinx.manager` is deprecated

./manage.py generate_sphinx_config cities >> sphinx.conf /home/limpbrains/dev/PythonPath/djangosphinx/manager.py:4: DeprecationWarning: djangosphinx.manager is deprecated. Please use djangosphinx.models instead. warnings.warn(djangosphinx.manager is deprecated. Please use djangosphinx.models instead., DeprecationWarning)

djangosphinx/management/commands/generate_sphinx_config.py
change
djangosphinx.manager
to
djangosphinx.models

count of records in group

I using "group_by" sphinx functionality and want to have access to @count virtual sphinx attribute in django templates but I can't because django don't like "@" symbol in the beginning of the variable name. I mean I can't write in templates like this:

{% for item in results %}
Count of items in group: {{ item.sphinx.@count }}
{% endfor %}

I suggest to add a small patch to SphinxProxy class in models.py:

def count(self):
return self._sphinx['attrs'].get('@count')

I'm not sure that this is appropriate patch. What do you think?

SphinxProxy.__getattr__ breaks the getattr contract [patch]

hasattr(aSphinxProxy, 'anything') always returns true. In some
situations that upsets Django's ORM.

The Python built-in hasattr relies on getattr raising
AttributeError to know whether an attribute exists [1].

In models.py, SphinxProxy implements __getattr__, and always passes a
default value (usually of None) to the built-in getattr, which means
it will never raise an exception, so hasattr will always be true.

The fix I'm using is to check value, and only pass it to the built-
in getattr if it is not None:

diff --git a/djangosphinx/models.py b/djangosphinx/models.py
index ba52036..e19efeb 100644
--- a/djangosphinx/models.py
+++ b/djangosphinx/models.py
@@ -108,7 +108,11 @@ class SphinxProxy(object):
             name = '_sphinx'
         if name == '_sphinx':
             return getattr(self, '_sphinx', value)
-        return getattr(self._current_object, name, value)
+        if value:
+            return getattr(self._current_object, name, value)
+        else:
+            return getattr(self._current_object, name)
+

     def __setattr__(self, name, value):
         if name == '_sphinx':

This patch is needed from at least v2.1.2. Tested in 2.1.2 and 2.1.4.

Google Groups post here: http://groups.google.com/group/django-sphinx/browse_thread/thread/9174fd1ddbab75f1

Thanks,
Graham

[1] http://docs.python.org/library/functions.html#hasattr

it generates wrong sql

example:
sql_query =
SELECT id, name, description, category_id, lng, lat
FROM citymaps_point

"" required at the end of SELECT line.

SphinxModelAdmin doesn't work

Did as Docs tell:

from djangosphinx.admin import SphinxModelAdmin

class MyAdmin(SphinxModelAdmin):
        index = 'my_index_name' # defaults to Model._meta.db_table
        weights = {'field': 100}

As result I got error global name 'Paginator' is not defined, fixed it, then another one - 'list' object has no attribute 'ordered'

Don't know what to do with this.

P.S. Django 1.3 if it matters.

Find record by non primary-key

I'm create model, which have CharField as PrimaryKey.

And on final stage of project I try to add SpinxSearch by this model. And Sphinx doesn't support non integer document id.

So, I add IntegerField to this model, and try to use it as document id.

After my changes in conf-file of sphinx I get

Painter.search.query('bern') - return 1
len(list(Painter.search.query('bern'))) - return 0

It's happen, because django-sphinx always use primary key for document id.

If I'm right, I can try to make patch, I which you can use parameter in SphinxSearch, for example document_id_field, which by default will be 'pk', but user can use any field.

What do you think about it?

rankmode default value not set

In the sourcecode, I found

SPH_RANK_PROXIMITY_BM25 = 0 # default mode, (...)

But when not setting rankmode in my model, ranking does not occur, all weight values are set to 1.

Autocomplete

Hello,

I want to do an autocompletion with django-sphinx but i failed. How can I activate the match mode? I read the doc but i didn't understand how to implement it.

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.