sprokopenko / concurrent-trees Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 4.09 MB

Automatically exported from code.google.com/p/concurrent-trees

Java 100.00%

concurrent-trees's People

Watchers

concurrent-trees's Issues

Add tree.size() methods

Currently..

    Iterables.count(myTree.getValuesForClosestKeys(""))

..can be used to count the number of keys/values in the radix tree.

This ticket is to add a size() method to the trees, to simplify this, and also 
it may be more efficient than to calculate size as above.

Note that calculating the size of a radix tree is an expensive operation having 
O(n) time complexity. However the method may be useful for debugging purposes.

Original issue reported on code.google.com by [email protected] on 3 Dec 2013 at 10:45

InvertedRadixTree.getKeyValuePairsForKeysPrefixing() returning keys that aren't even in the tree

Hello, unless I am confused about what to expect, I believe 
getKeyValuePairsForKeysPrefixing() is not returning the correct information.  
In fact, I believe it is returning "keys" that aren't even in the tree.  It is 
returning the values at a node, but not the full key that was stored.

What steps will reproduce the problem?
1. Run the attached TreeTest class.

Here is the output I am getting (using the most recent jar downloaded 
yesterday):

**** Constructing new tree
Added key/value pair: /a/b/ -> 1
Added key/value pair: /a/blob/ -> 2
Added key/value pair: /a/blog/ -> 3
○
└── ○ /a/b
    ├── ○ / (1)
    └── ○ lo
        ├── ○ b/ (2)
        └── ○ g/ (3)
Keys prefixing /: {/, 1} 
Keys prefixing /a/: {/, 1} 
Keys prefixing /a/b/: {/a/b/, 1} 
Keys prefixing /a/bl/: 
Keys prefixing /a/blo/: 
Keys prefixing /a/blob/: {/a/blob/, 2} 
Keys prefixing /a/blog/: {/a/blog/, 3} 
**** Constructing new tree
Added key/value pair: /a/b -> 1
Added key/value pair: /a/blob -> 2
Added key/value pair: /a/blog -> 3
○
└── ○ /a/b (1)
    └── ○ lo
        ├── ○ b (2)
        └── ○ g (3)
Keys prefixing /: 
Keys prefixing /a: 
Keys prefixing /a/b: {/a/b, 1} 
Keys prefixing /a/bl: {/a/b, 1} 
Keys prefixing /a/blo: {/a/b, 1} 
Keys prefixing /a/blob: {/a/b, 1} {/a/blob, 2} 
Keys prefixing /a/blog: {/a/b, 1} {/a/blog, 3} 


It looks to me like the tree structure is correct, but 
getKeyVAluePairsForKeysPrefixing() is returning the incorrect key/value pairs 
for several values.  For example, with first tree in the example above:
Keys prefixing /: {/, 1}   <-  No key "/" stored; this is the node for /a/b/
Keys prefixing /a/: {/, 1}  <- Ditto; no key / was stored


I am using concurrent-trees-2.1.0.jar on Fedora 17.

Original issue reported on code.google.com by [email protected] on 5 Oct 2013 at 7:29

Attachments:

TreeTest.java

Deploy to Maven central

Deploy to Maven Central per: 
https://docs.sonatype.org/display/Repository/Sonatype+OSS+Maven+Repository+Usage
+Guide

Original issue reported on code.google.com by [email protected] on 4 Jul 2012 at 2:46

Support 8-bit encoded strings

Java's default UTF-16, 2-bytes-per-character string encoding, is inefficient 
for strings which otherwise could be encoded with a single byte per character.

It should be possible to represent characters in the trees using only a single 
byte per character, when working with compatible strings. This may reduce 
memory overhead by 50%.

Original issue reported on code.google.com by [email protected] on 20 Oct 2013 at 10:20

Expose API to get longest prefix match

Expose an API to scan the input for keys stored in the tree which are prefixes 
of the input.
See discussion in forum: 
https://groups.google.com/forum/#!topic/concurrent-trees-discuss/_IpLEzNDFWs

Example: tree contains keys 123, 1234568, 1234569

Input: 12345690

API would return keys 123, 1234569.

This could be used for processing phone numbers.

This could be calculated in a single scan through the input, thus finding keys 
which are prefixes of the input very quickly. This functionality is a subset of 
InvertedRadixTree.getKeysContainedIn, and can use the same traversal algorithm.

Unit test demonstrating desired functionality:

    @Test
    public void testGetKeysPrefixing() throws Exception {
        ConcurrentInvertedRadixTree<Integer> tree = new ConcurrentInvertedRadixTree<Integer>(nodeFactory);

        tree.put("1234567", 1);
        tree.put("1234568", 2);
        tree.put("123", 3);

        //    ○
        //    └── ○ 123 (3)
        //        └── ○ 456
        //            ├── ○ 7 (1)
        //            └── ○ 8 (2)

        assertEquals("[123, 1234567]", Iterables.toString(tree.getKeysPrefixing("1234567")));
        assertEquals("[123, 1234567]", Iterables.toString(tree.getKeysPrefixing("12345670")));
        assertEquals("[123, 1234568]", Iterables.toString(tree.getKeysPrefixing("1234568")));
        assertEquals("[123, 1234568]", Iterables.toString(tree.getKeysPrefixing("12345680")));
        assertEquals("[123]", Iterables.toString(tree.getKeysPrefixing("1234569")));
        assertEquals("[123]", Iterables.toString(tree.getKeysPrefixing("123456")));
        assertEquals("[123]", Iterables.toString(tree.getKeysPrefixing("123")));
        assertEquals("[]", Iterables.toString(tree.getKeysPrefixing("12")));
        assertEquals("[]", Iterables.toString(tree.getKeysPrefixing("")));
    }

Original issue reported on code.google.com by [email protected] on 7 Aug 2013 at 9:03

New types of tree: ConcurrentPermutermTree, ConcurrentWildcardTree for wildcard queries

It would be useful to support wildcard queries.

Two approaches to be investigated (both of which will be tracked in this issue):

(1) A permuterm index on top of the ConcurrentRadixTree. This would support 
queries such as "<prefix>*<suffix>" on a single tree. It may be more memory 
efficient than a hash-dictionary approach. See: 
http://nlp.stanford.edu/IR-book/html/htmledition/permuterm-indexes-1.html

(2) A composite of a ConcurrentRadixTree and a ConcurrentReversedRadixTree. One 
tree would support prefix lookup, the other suffix lookup. Query 
"prefix*suffix" may return the intersection of the results from both trees, 
after some post-filtering. This second approach however, is near the territory 
of a query engine on top of multiple indexes, so if implemented would not 
belong in this project, but in http://code.google.com/p/cqengine/

Example usage for (1) would be:

public static void main(String[] args) {
    PermutermTree<Integer> tree = new ConcurrentPermutermTree<Integer>(new DefaultCharArrayNodeFactory());

    tree.put("TEST", 1);
    tree.put("TOAST", 2);
    tree.put("TEAM", 3);

    System.out.println("Keys matching 'T*T': " + Iterables.toString(tree.getKeysMatching("T", "T"))); // prefix, suffix
}


Output would be:
    Keys matching 'T*T': [TOAST, TEST]

Original issue reported on code.google.com by [email protected] on 24 Mar 2013 at 10:19

Can we make the trees serializable?

The current implementation is not serializable. If we load a huge amount of 
data each time when starting, this may limit the usage. However, if it is 
serializable, we can load it once and serialize the entire tree onto the disk. 
During the start-up time, we only have to de-serialize it to load the whole 
tree quickly.

Original issue reported on code.google.com by [email protected] on 25 Mar 2013 at 2:29

Expose API to get longest prefix match / "closest" keys


Please add support for querying the longest prefix match (this is not currently 
exposed in the public API). A variation of this which only matches if the key 
is in fact truly a prefix would be useful, otherwise the caller would have to 
run an additional prefix.startsWith(key). For example, if the tree only 
contains "foo", it's the longest prefix match for anything. But in the use-case 
I have, I would only want to match "foo*".

Original issue reported on code.google.com by phraktle on 19 Nov 2012 at 10:33

add tree.contains()-method

In my opinion a very useful method would be a boolean .contains()-method, which 
check if a query is contained in the tree: This should be similar to 
.getKeysStartingWith(query), but break if it finds the first path matching the 
query and return true.

For example:

tree.put("TEST", 1);
tree.put("TOAST", 2);
tree.put("TEAM", 3);

tree.contains("TO") -> returns true.

Original issue reported on code.google.com by [email protected] on 25 Sep 2014 at 1:34

ConcurrentSuffixTree.getKeysEndingWith(x) returns null when x is empty

What steps will reproduce the problem?
1. Create a ConcurrentSuffixTree
2. Insert some keys
3. Attempt to retrieve all keys by using .getKeysEndingWith("")

What is the expected output? What do you see instead?
I expect an iterable with all keys; I get null. 

What version of the product are you using? On what operating system?
2.4.0 on  Java 1.8.0_20

Please provide any additional information below.
.getKeysStartingWith("") and .getKeysEndingWith("") return all keys for 
ConcurrentRadixTree and ConcurrentReversedRadixTree respectively.

Original issue reported on code.google.com by [email protected] on 25 Oct 2014 at 9:30

sprokopenko / concurrent-trees Goto Github PK

concurrent-trees's People

Watchers

concurrent-trees's Issues

Add tree.size() methods

InvertedRadixTree.getKeyValuePairsForKeysPrefixing() returning keys that aren't even in the tree

Deploy to Maven central

Support 8-bit encoded strings

Expose API to get longest prefix match

New types of tree: ConcurrentPermutermTree, ConcurrentWildcardTree for wildcard queries

Can we make the trees serializable?

Expose API to get longest prefix match / "closest" keys

add tree.contains()-method

ConcurrentSuffixTree.getKeysEndingWith(x) returns null when x is empty

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs