GithubHelp home page GithubHelp logo

solr not working on integers? about solandra HOT 16 CLOSED

tjake avatar tjake commented on August 23, 2024
solr not working on integers?

from solandra.

Comments (16)

tjake avatar tjake commented on August 23, 2024

Ok thanks, I'll take a look

from solandra.

sdonelow avatar sdonelow commented on August 23, 2024

I think I found the solution to this. Take this with a grain of salt, I'm new to Lucandra/Solr. We had the same problem with long and used slong data type instead and that fixed the problem. So, try changing the popularity field data type to sint in your schema.xml file.

from solandra.

tjake avatar tjake commented on August 23, 2024

Is this still happening?

from solandra.

leoz-xx avatar leoz-xx commented on August 23, 2024

I'm having similar issues (with long and double instead of int though). Range queries not working... slong in solr might be the walk arround as sdonelow commented, but I prefer not using solr in my project...

from solandra.

tnine avatar tnine commented on August 23, 2024

Hey guys. I'm assuming you're still having this issue? I'm trying to sort it out, and it appears to be functionally impossible with the current implementation. Basically, the number bits of the data type is right shifted 4 bits at a time. The first byte then holds the number of bits shifted off. You can view the logic here for creating the trie structures.

http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/search/NumericRangeQuery.html

This allows for faster range scanning and in makes seeks faster. However according to the IndexReader spec here,

http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/IndexReader.html#terms(org.apache.lucene.index.Term)

"If the given term does not exist, the enumeration is positioned at the first term greater than the supplied term."

The current implementation does not do this, it merely returns no results since it scans over 2 keys, then returns only 2 key spaces and returns an empty result set. I'm looking into trying to rectify this problem. Correcting it may involve reading far more than 2 keys initially, so it will not be a very efficient operation.

from solandra.

tnine avatar tnine commented on August 23, 2024

I've created a simple unit test that mimics how keys are written with my own IndexReader and TermEnum here.

http://github.com/tnine/Lucandra/blob/master/test/lucandra/BytesOrderingEnumTest.java

It doesn't complete because I haven't implemented all the document scoring. However, it does correctly identify records to enumerate over when no prefix is present. If you uncomment my commented lines, you will see the byte comparator used no longer seeks to the correct index when index\docfield prefixes are used. Therefore, something isn't quite right with the prefix and the byte ordering. I just can't put my finger on it, since all the prefix bytes should be the same in common fields, and hence irrelevant in the byte comparison up to the first byte in the trie structure.

from solandra.

tnine avatar tnine commented on August 23, 2024

After much digging this appears to be an encoding issue with thrift and batch mutate itself. The issue and corresponding unit test is here.

https://issues.apache.org/jira/browse/CASSANDRA-1235

from solandra.

tmahesh avatar tmahesh commented on August 23, 2024

Is sorting on integer/float fields supported in solr-cassandra?

I tried the below query on the index of example docs. But did not get results in correct order
http://localhost:8983/solr/select/?q=cat:electronics&sort=price%20asc

I have tried changing field type to "sint" "tint" but no success. Sorting on string field type works though. Any suggestion on how to fix sorting issue for integer and float?

from solandra.

tnine avatar tnine commented on August 23, 2024

see the underlying bug. We can't properly encode any numeric fields, as a result, you can't perform sorting on them. Until Cassandra fixes this issue, no numeric field searching/sorting will work.

from solandra.

tmahesh avatar tmahesh commented on August 23, 2024
  1. We can store integer/float data and fetch it out correctly (i.e., price filed fetched from the index is as it was stored)
  2. From what i understand, sorting of result set happens inside solr indexsearcher

Shouldn't sorting work in such a case?

I'm confused on how the cassandra bug impacts sorting while we can fetch the stored data correctly from the index.

from solandra.

tnine avatar tnine commented on August 23, 2024

I could be wrong in how solr stores and retrieves indexes. However I know I'm accurate in stating that we currently can't store numeric values in Cassandra correctly/consistently. Run my test cases and you'll see exactly what I mean. You will occasionally get correct behavior as the encoding problem does not present itself with all values. It seems to depend on the byte value that is stored. This fix was bumped from 0.6.4 to 0.6.5, so it doesn't seem to be getting fixed anytime soon. Check out the Solr code, and see if it's using numeric values in the underlying fields. If it is, you can't use it until the Cassandra bug is fixed.

from solandra.

tjake avatar tjake commented on August 23, 2024

Actually, cassandra guys decided to ditch String keys for byte[], this will fix the issue, I assume its going in 0.6.5 but you can see it now in trunk.

from solandra.

sdonelow avatar sdonelow commented on August 23, 2024

tjake, what does this mean "ditch String keys for byte[]"?

from solandra.

tnine avatar tnine commented on August 23, 2024

Currently all keys in Cassandra are UTF8 strings. This has been removed in favor of using native bytes in the new version. This should eliminate the issues we see with shifting 7 bits of numeric types into the lower 7 bits of a UTF8 byte. Hence removing the limitation of numeric fields in Lucandra. Note that this will require a decent amount of rework of Lucandra, but I plan on doing that as soon at 0.7 is release since we really need numeric functionality.

from solandra.

tnine avatar tnine commented on August 23, 2024

Just an fyi guys. This has been fixed in release 0.6.5 of Cassandra, so numeric fields should now work.

from solandra.

tjake avatar tjake commented on August 23, 2024

fixed in cassandra 0.6.5

from solandra.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.