karimbahgat / pyqtree Goto Github PK

A pure Python quad tree spatial index for GIS or rendering usage

License: MIT License

Python 100.00%

pyqtree's Introduction

Pyqtree

Pyqtree is a pure Python spatial index for GIS or rendering usage. It stores and quickly retrieves items from a 2x2 rectangular grid area, and grows in depth and detail as more items are added. The actual quad tree implementation is adapted from Matt Rasmussen's compbio library and extended for geospatial use.

Platforms

Python 2 and 3.

Dependencies

Pyqtree is written in pure Python and has no dependencies.

Installing It

Installing Pyqtree can be done by opening your terminal or commandline and typing:

pip install pyqtree

Alternatively, you can simply download the "pyqtree.py" file and place it anywhere Python can import it, such as the Python site-packages folder.

Example Usage

Start your script by importing the quad tree.

from pyqtree import Index

Setup the spatial index, giving it a bounding box area to keep track of. The bounding box being in a four-tuple: (xmin, ymin, xmax, ymax).

spindex = Index(bbox=(0, 0, 100, 100))

Populate the index with items that you want to be retrieved at a later point, along with each item's geographic bbox.

# this example assumes you have a list of items with bbox attribute
for item in items:
    spindex.insert(item, item.bbox)

Then when you have a region of interest and you wish to retrieve items from that region, just use the index's intersect method. This quickly gives you a list of the stored items whose bboxes intersects your region of interests.

overlapbbox = (51, 51, 86, 86)
matches = spindex.intersect(overlapbbox)

There are other things that can be done as well, but that's it for the main usage!

More Information:

License:

This code is free to share, use, reuse, and modify according to the MIT license, see LICENSE.txt.

Credits:

Karim Bahgat
Joschua Gandert

Changes

1.0.0 (2018-09-14)

Bump to first major version
Fix so returns list instead of set
Support inserting hashable items

0.25.0 (2016-06-22)

Misc user contributions and bug fixes

0.24.0 (2015-06-18)

Previous stable PyPI version.

pyqtree's People

Contributors

Stargazers

Watchers

pyqtree's Issues

pyqtree.Index.intersect returns duplicate ids after serialization

I've encountered what seems to be a serialization bug. If I create a dummy Index with some random boxes and perform a query, everything works fine. However, if I use pickle to serialize and de-serialize the Index, it starts to return duplicate node ids. This isn't a huge problem because I can just remove duplicates, but it may be indicative of a serialization issue that should be fixed.

I've constructed a minimal working example and tested a few variations. First here is the MWE that demonstrates the problem:

import numpy as np
import pyqtree

# Populate a qtree with a set of random boxes
aid_to_tlbr = {779: np.array([412, 404, 488, 455]),
               781: np.array([127, 429, 194, 517]),
               782: np.array([459, 282, 517, 364]),
               784: np.array([404, 160, 496, 219]),
               785: np.array([336, 178, 367, 209]),
               786: np.array([366, 459, 451, 527]),
               788: np.array([491, 434, 532, 504]),
               789: np.array([251, 185, 322, 248]),
               790: np.array([266, 104, 387, 162]),
               791: np.array([ 65, 296, 138, 330]),
               792: np.array([331, 241, 368, 347])}
orig_qtree = pyqtree.Index((0, 0, 600, 600))
for aid, tlbr in aid_to_tlbr.items():
    orig_qtree.insert(aid, tlbr)

# Issue a query and inspect results
query = np.array([0, 0, 300, 300])
original_result = orig_qtree.intersect(query)

# We see that everything looks fine
print('original_result = {!r}'.format(sorted(original_result)))

# Serialize and unserialize the Index, and inspect results
import pickle
serial = pickle.dumps(orig_qtree)
new_qtree = pickle.loads(serial)

# Issue the same query on the reloaded Index, the result now
# contains duplicate items!!
new_result = new_qtree.intersect(query)
print('new_result = {!r}'.format(sorted(new_result)))

This results in the following output:

original_result = [789, 790, 791]
new_result = [789, 789, 790, 790, 791, 791]

As you can see the new result has duplicate node ids.

This bug has some other interesting properties. First, serializing a second time doesn't make anything worse, so that's good.

        third_qtree = pickle.loads(pickle.dumps(new_qtree))
        third_result = third_qtree.intersect(query)
        print('third_result = {!r}'.format(sorted(third_result)))

The output is the same as new_result.

third_result = [789, 789, 790, 790, 791, 791]

Something really weird is that the specific node-ids seem to impact if this bug happens. If I reindex the nodes to use 0-10 instead of the 700ish numbers in the node ids the problem goes away!

        aid_to_tlbr = {0: np.array([412, 404, 488, 455]),
                       1: np.array([127, 429, 194, 517]),
                       2: np.array([459, 282, 517, 364]),
                       3: np.array([404, 160, 496, 219]),
                       4: np.array([336, 178, 367, 209]),
                       5: np.array([366, 459, 451, 527]),
                       6: np.array([491, 434, 532, 504]),
                       7: np.array([251, 185, 322, 248]),
                       8: np.array([266, 104, 387, 162]),
                       9: np.array([ 65, 296, 138, 330]),
                       10: np.array([331, 241, 368, 347])}
        qtree3 = pyqtree.Index((0, 0, 600, 600))
        for aid, tlbr in aid_to_tlbr.items():
            qtree3.insert(aid, tlbr)
        query = np.array([0, 0, 300, 300])
        result3 = qtree3.intersect(query)
        print('result3 = {!r}'.format(sorted(result3)))
        qtree4 = pickle.loads(pickle.dumps(qtree3))
        result4 = qtree4.intersect(query)
        print('result4 = {!r}'.format(sorted(result4)))

Results in:

result3 = [7, 8, 9]
result4 = [7, 8, 9]

Results were obtained using pyqtree.version = '1.0.0' and python 3.6 in Ubuntu 18.04.

Why do not delete a node operation?

iterating does not work as expected

The behavior I would expect is:

    def __iter__(self):
        yield self
        for child in _loopallchildren(self):
            yield child

Iterating over quad doesn't yield itself. This makes ex. iterating over all items in the tree more tedious.:

for node in myQuadTree:
    yield node.item
for quad in myQuadTree:
    for node in quad.nodes:
        yield node.item

Wrong insertion

I have worked myself through the code and I have found that the return statements during insertion, gives rise to errors:

def _insert_into_children(self, item, rect):
    # if rect spans center then insert here
    if ((rect[0] <= self.center[0] and rect[2] > self.center[0]) and
        (rect[1] <= self.center[1] and rect[3] > self.center[1])):
        node = _QuadNode(item, rect)
        self.nodes.append(node)
        return node
    else:
        # try to insert into children
        if rect[0] <= self.center[0]:
            if rect[1] <= self.center[1]:
                return self.children[0].insert(item, rect)
            if rect[3] > self.center[1]:
                return self.children[1].insert(item, rect)
        if rect[2] > self.center[0]:
            if rect[1] <= self.center[1]:
                return self.children[2].insert(item, rect)
            if rect[3] > self.center[1]:
                return self.children[3].insert(item, rect)

If a set of nodes are splitted at some point, a feature will be put into one and only one child with the current code. Removing the returns will allow a feature to be put correctly into more than one child.

Infinite loop and extreme memory usage when adding semi-out-of-bounds boxes

I've run into an extremely weird issue, I don't know the cause but I can reliably reproduce it.

In the case where I create a qtree with certain bounds and I add multiple box that is partially out of bounds. The first 10 seem to add just fine, but the 11th hangs and memory usage continues to grow.

minimal working example is:

        import pyqtree
        qtree = pyqtree.Index((0, 0, 600, 600))
        oob_tlbr_box = [500, 800, 1000, 1000]
        for idx in range(1, 11):
            print('Insert idx = {!r}'.format(idx))
            qtree.insert(idx, oob_tlbr_box)
        idx = 11
        print('Insert idx = {!r}'.format(idx))
        qtree.insert(idx, oob_tlbr_box)

I've verified that this does not happen with all out of bounds boxes. It seems like there are only particular combinations of the entire box is out of bounds, or if both the top-left x and y are partially in bounds. I ended up writing a script to test a bunch of cases:

    import ubelt as ub
    # Test multiple cases
    def basis_product(basis):
        """
        Args:
            basis (Dict[str, List[T]]): list of values for each axes

        Yields:
            Dict[str, T] - points in the grid
        """
        import itertools as it
        keys = list(basis.keys())
        for vals in it.product(*basis.values()):
            kw = ub.dzip(keys, vals)
            yield kw

    height, width = 600, 600
    # offsets = [-100, -50, 0, 50, 100]
    offsets = [-100, -10, 0, 10, 100]
    # offsets = [-100, 0, 100]
    x_edges = [0, width]
    y_edges = [0, height]
    # x_edges = [width]
    # y_edges = [height]
    basis = {
        'tl_x': [e + p for p in offsets for e in x_edges],
        'tl_y': [e + p for p in offsets for e in y_edges],
        'br_x': [e + p for p in offsets for e in x_edges],
        'br_y': [e + p for p in offsets for e in y_edges],
    }

    # Collect and label valid cases
    # M = in bounds (middle)
    # T = out of bounds on the top
    # L = out of bounds on the left
    # B = out of bounds on the bottom
    # R = out of bounds on the right
    cases = []
    for item in basis_product(basis):
        bbox = (item['tl_x'], item['tl_y'], item['br_x'], item['br_y'])
        x1, y1, x2, y2 = bbox
        if x1 < x2 and y1 < y2:
            parts = []

            if x1 < 0:
                parts.append('x1=L')
            elif x1 < width:
                parts.append('x1=M')
            else:
                parts.append('x1=R')

            if x2 <= 0:
                parts.append('x2=L')
            elif x2 <= width:
                parts.append('x2=M')
            else:
                parts.append('x2=R')

            if y1 < 0:
                parts.append('y1=T')
            elif y1 < width:
                parts.append('y1=M')
            else:
                parts.append('y1=B')

            if y2 <= 0:
                parts.append('y2=T')
            elif y2 <= width:
                parts.append('y2=M')
            else:
                parts.append('y2=B')

            assert len(parts) == 4
            label = ','.join(parts)
            cases.append((label, bbox))

    cases = sorted(cases)
    print('total cases: {}'.format(len(cases)))

    failed_cases = []
    passed_cases = []

    # We will execute the MWE in a separate python process via the "-c"
    # argument so we can programatically kill cases that hang
    test_case_lines = [
        'import pyqtree',
        'bbox, width, height = {!r}, {!r}, {!r}',
        'qtree = pyqtree.Index((0, 0, width, height))',
        '[qtree.insert(idx, bbox) for idx in range(1, 11)]',
        'qtree.insert(11, bbox)',
    ]

    import subprocess
    for label, bbox in cases:
        pycmd = ';'.join(test_case_lines).format(bbox, width, height)
        command = 'python -c "{}"'.format(pycmd)
        info = ub.cmd(command, detatch=True)
        proc = info['proc']
        try:
            if proc.wait(timeout=0.1) != 0:
                raise AssertionError
        except (subprocess.TimeoutExpired, AssertionError):
            # Kill cases that hang
            proc.terminate()
            text = 'Failed case: {}, bbox = {!r}'.format(label, bbox)
            color = 'red'
            failed_cases.append((label, bbox, text))
        else:
            out, err = proc.communicate()
            text = 'Passed case: {}, bbox = {!r}'.format(label, bbox)
            color = 'green'
            passed_cases.append((label, bbox, text))
        print(ub.color_text(text, color))
    print('len(failed_cases) = {}'.format(len(failed_cases)))
    print('len(passed_cases) = {}'.format(len(passed_cases)))

    passed_labels = set([t[0] for t in passed_cases])
    failed_labels = set([t[0] for t in failed_cases])
    print('passed_labels = {}'.format(ub.repr2(sorted(passed_labels))))
    print('failed_labels = {}'.format(ub.repr2(sorted(failed_labels))))
    print('overlap = {}'.format(set(passed_labels) & set(failed_labels)))

The idea is I create boxes that are either in or out of bounds in various ways. I label these as:

    # M = in bounds (middle)
    # T = out of bounds on the top
    # L = out of bounds on the left
    # B = out of bounds on the bottom
    # R = out of bounds on the right

In the script I run a separate python process that executes the MWE and checks to see if it times out. If it does then I assume the process hung.

Running this script (which takes a bit when there a lot of cases). In total 1557 cases pass and 468 cases fail. The passing and failing cases follow a pattern. Passing and failing labels disjoint are:

passed_labels = [
    'x1=L,x2=L,y1=B,y2=B',
    'x1=L,x2=L,y1=T,y2=T',
    'x1=L,x2=M,y1=M,y2=B',
    'x1=L,x2=M,y1=M,y2=M',
    'x1=L,x2=M,y1=T,y2=B',
    'x1=L,x2=M,y1=T,y2=M',
    'x1=L,x2=R,y1=M,y2=B',
    'x1=L,x2=R,y1=M,y2=M',
    'x1=L,x2=R,y1=T,y2=B',
    'x1=L,x2=R,y1=T,y2=M',
    'x1=M,x2=M,y1=M,y2=B',
    'x1=M,x2=M,y1=M,y2=M',
    'x1=M,x2=M,y1=T,y2=B',
    'x1=M,x2=M,y1=T,y2=M',
    'x1=M,x2=R,y1=M,y2=B',
    'x1=M,x2=R,y1=M,y2=M',
    'x1=M,x2=R,y1=T,y2=B',
    'x1=M,x2=R,y1=T,y2=M',
    'x1=R,x2=R,y1=B,y2=B',
    'x1=R,x2=R,y1=T,y2=T',
]
failed_labels = [
    'x1=L,x2=L,y1=M,y2=B',
    'x1=L,x2=L,y1=M,y2=M',
    'x1=L,x2=L,y1=T,y2=B',
    'x1=L,x2=L,y1=T,y2=M',
    'x1=L,x2=M,y1=B,y2=B',
    'x1=L,x2=M,y1=T,y2=T',
    'x1=L,x2=R,y1=B,y2=B',
    'x1=L,x2=R,y1=T,y2=T',
    'x1=M,x2=M,y1=B,y2=B',
    'x1=M,x2=M,y1=T,y2=T',
    'x1=M,x2=R,y1=B,y2=B',
    'x1=M,x2=R,y1=T,y2=T',
    'x1=R,x2=R,y1=M,y2=B',
    'x1=R,x2=R,y1=M,y2=M',
    'x1=R,x2=R,y1=T,y2=B',
    'x1=R,x2=R,y1=T,y2=M',
]

Each label is 4 chars following the above coding sequence for x1,x2,y1,y2. You can see that all failing cases involve exactly one of x or y being entirely out of bounds on the head or tail side of the axis, while the other axis is always partially touching the middle.

I haven't dug into the code at all to see where the error could be coming from. Hopefully these details help.

Result of intersect() is a set, not a list

Per the API docs, intersect() returns a list; however, in reality, it returns a set. This makes it impossible to store non-hashable types like dicts with insert().

I forked the repo and changed the set to a list, which otherwise works fine, but sometimes the list contains duplicates. I presume the set is intended as a way to work around this, but why is this happening and would it be possible to fix at the source?

Minor documentation issue: countmembers doesn't exist

The API documentation refers to countmembers when actually it's just implementing len (which is better).

insert data not in bbox object?

I'd like to use data that is already on a regular grid, so I already have a dataframe with ((xmin, ymin, xmax, ymax)) for each of my items... Is it possible to populate the index with the data in this form rather than a bbox object?

E.g., I have a dataframe with columns ['data', 'xmin', 'xmax', 'ymin', 'ymax']
Is there a way to add each row of the dataframe to the index without the extra step of creating a bbox object?

"left/bottom" edge intersection not detected due to `>` vs `>=`

what i've noticed after trying out this library is that it doesn't detect intersections on the left/bottom boundary condition of the query-box.

this is due mostly to pyqtree.py's use of > instead of >= when checking intersection.
was this a conscious choice that is either generically important or use case important?

here's my test case for 3 boxes along the x=y axis.

import pyqtree
import wqtree
from rtree import index

RTREE_PROPS = index.Property()
RTREE_PROPS.dimension = 2
boxes = [
    (0, 0, 1, 1),
    (1, 1, 2, 2),
    (2, 2, 3, 3),
]
tests = [
    (0, 0, 1, 1),
    (0, 1, 1, 2),
    (1, 0, 2, 1),
    (1, 2, 2, 3),
    (2, 1, 3, 2),
    (.5, .5, 2.5, 2.5),
]

rt = index.Index(((i, b, None,) for i, b in enumerate(boxes)), properties=RTREE_PROPS)
for i, t in enumerate(tests):
  print 'Rt{%s, %s} =' % (i, t), list(rt.intersection(t))

print '-'
qt = pyqtree.Index((-100, -100, 100, 100))
for i, b in enumerate(boxes):
  qt.insert(i, b)
for i, t in enumerate(tests):
  print 'Qt{%s, %s} =' % (i, t), list(qt.intersect(t))
print '-'
qt = wqtree.Index((-100, -100, 100, 100))
for i, b in enumerate(boxes):
  qt.insert(i, b)
for i, t in enumerate(tests):
  print 'qt{%s, %s} =' % (i, t), list(qt.intersect(t))

output:

Rt{0, (0, 0, 1, 1)} = [0, 1]
Rt{1, (0, 1, 1, 2)} = [0, 1]
Rt{2, (1, 0, 2, 1)} = [0, 1]
Rt{3, (1, 2, 2, 3)} = [1, 2]
Rt{4, (2, 1, 3, 2)} = [1, 2]
Rt{5, (0.5, 0.5, 2.5, 2.5)} = [0, 1, 2]
-
Qt{0, (0, 0, 1, 1)} = [0, 1]
Qt{1, (0, 1, 1, 2)} = [1]
Qt{2, (1, 0, 2, 1)} = [1]
Qt{3, (1, 2, 2, 3)} = [2]
Qt{4, (2, 1, 3, 2)} = [2]
Qt{5, (0.5, 0.5, 2.5, 2.5)} = [0, 1, 2]
-
qt{0, (0, 0, 1, 1)} = [0, 1]
qt{1, (0, 1, 1, 2)} = [0, 1]
qt{2, (1, 0, 2, 1)} = [0, 1]
qt{3, (1, 2, 2, 3)} = [1, 2]
qt{4, (2, 1, 3, 2)} = [1, 2]
qt{5, (0.5, 0.5, 2.5, 2.5)} = [0, 1, 2]

diff for 'fix':

diff pyqtree.py wqtree.py
159c159
<                 if rect[3] > self.center[1]:

---
>                 if rect[3] >= self.center[1]:
161c161
<             if rect[2] > self.center[0]:

---
>             if rect[2] >= self.center[0]:
164c164
<                 if rect[3] > self.center[1]:

---
>                 if rect[3] >= self.center[1]:
168,169c168,169
<             if (node.rect[2] > rect[0] and node.rect[0] <= rect[2] and 
<                 node.rect[3] > rect[1] and node.rect[1] <= rect[3]):

---
>             if (node.rect[2] >= rect[0] and node.rect[0] <= rect[2] and 
>                 node.rect[3] >= rect[1] and node.rect[1] <= rect[3]):
177,178c177,178
<         if ((rect[0] <= self.center[0] and rect[2] > self.center[0]) and
<             (rect[1] <= self.center[1] and rect[3] > self.center[1])):

---
>         if ((rect[0] <= self.center[0] and rect[2] >= self.center[0]) and
>             (rect[1] <= self.center[1] and rect[3] >= self.center[1])):
186c186
<                 if rect[3] > self.center[1]:

---
>                 if rect[3] >= self.center[1]:
188c188
<             if rect[2] > self.center[0]:

---
>             if rect[2] >= self.center[0]:
191c191
<                 if rect[3] > self.center[1]:

---
>                 if rect[3] >= self.center[1]:

feature request: iterating over items, clearing

    def _items(self):
        for node in self.nodes:
            yield node.item
        for quad in self:
            for node in quad.nodes:
                yield node.item

    def _clear(self):
        self.nodes.clear()
        self.children.clear()

These would be helpful for quality of life.

Updating coordinates

Pardon my ignorance, I'm completely new to spacial indexing. I'm looking for a framework that allows to update an index (e.g. "Update object A with previous coordinates (x0,y0) for new position (x1,y1)").

Pyqtree allows to remove the element and re-add it, but there is no update operation. Is this the state-of-the-art way to do it or does that mean that Pyqtree was not intended for this particular use case?

Your answer will enlarge my horizon, thank you very much in advance.