Range Trees

Both Quadtrees and kd-Trees are slow for range searches. Quadtrees are also potentially wasteful in space.

Range trees are considered multi-dimensional range trees:

  • organize dictionary points to support efficient search with variant of BST search
  • both internal and leaf nodes store points, similar to one dimensional BST

New idea: Range trees

  • Somewhat wasteful in space, but much faster in range search.
  • [Tree of trees] (a multi-level data structure)

2-dimensional Range Trees

Primary structure: Balanced binary search tree that stores and uses [x-coordinate] as keys.

Every node of stores an [associate structure] :

  • Let be all points in subtree of in (including point at )
  • stores in a balanced BST using the [y-coordinate] as key.
  • Note: is not necessarily the root of

Range tree example: Basically, each node has an associate tree (bst). But it is not shown in this picture. Only and are shown.

Range Tree Space Analysis

  • Primary tree uses space. How???

  • Associate tree uses space, where are points at descendants of in

  • Key insight: means that is an ancestor of in

    • Every node has ancestors in (Recall that we assume to be balanced)
    • Every node belongs to sets
    • So A range-tree with points uses space. I think that’s because we have the initial primary tree with points and each one of them has an associated tree???

Range Tree Operations

  • search: search by x-coordinate in
  • insert: First, insert point by x-coordinate into . Then, walk back up to the root and insert the point by y-coordinate in all associate trees of nodes on path to the root.
  • delete: analogous to insertion

Problem: We want the binary search trees to be balanced.

  • This makes insert/delete very slow if we use AVL-trees (to balance, a rotation at changes and hence requires to re-build

  • Solution: Completely rebuild highly unbalanced subtrees (no details)

  • : search by x-range in . Among found points, search by y-range in some associated trees.

BST Range Search recursive Keys are reported in in-order (in sorted order)

BST Range Search example: Since , we call recursion on the left, Now 36 is between and , so we recurse on right subtree and left subtree. On the left subtree, so we recurse on the right. so recurse on the right. We arrive at leaf 35. On the right subtree, is between and , so we recurse on both subtrees.

  • Search for left boundary : this gives path (By performing BST::search())

  • Search for left boundary : this gives path (By performing BST::search()) - This partition’s into three groups: outside, on, or between the paths.

  • boundary nodes: nodes in or

    • For each boundary node, test whether it is in the range. This is important, for each boundary node, we need to check explicitly if they fit in the range.
  • outside nodes: nodes that are left of or right of

    • These are not in the range, we don’t visit them
  • inside nodes: nodes that are right of and left of

    • We keep a list of the topmost inside nodes.
    • All descendants of such a node are in the range. For a 1d range search, report them.

BST Range Search Analysis

Range search for is a two stage process:

  • Perform a range search (on the x-coordinate) for the interval in primary tree (
  • Get boundary and topmost inside nodes as before.
  • For every [boundary node], test to see if the corresponding point is within the region .
  • For every[ topmost inside node] :
    • Let be the points in the subtree of in .
    • We know that all x-coordinates of points in are within range.
    • Recall: is stored in .
    • To find points in where the y-coordinates are within range as well, perform a range search in

are points in the subtree of in and it is stored in !

Example:

So my understanding, is that we start with x-coordinates, we found that the topmost nodes are 6 and 12. The boundary nodes are 4 and 14?? Is 15 also a boundary point? So we must test to see if the corresponding points are within the region, after checking 4 is not in region, but 14 is! Now I understand why we need to check boundary nodes. Then we go through the topmost nodes’s and perform a range search in to see where the y-coordinates are within range as well.

Overview:
  • First perform only partial BST-RangeSearch
    • find boundary and topmost inside nodes which takes , but DO NOT go through the inside subtrees
    • modified version takes time
      • does not visit all the nodes in valid range of BST-RangeSearch(T, 5,14)
  • Next:
    • for boundary nodes, check if both x and y coord are in the range which takes time as there are boundary nodes
    • for every topmost inside node , search in associated tree

Range Trees: Range Search Run-time

  • time to find boundary and topmost inside nodes in primary tree . Since it’s similar to BST?
  • There are such nodes.
  • For each topmost inside node , perform range search for y-range in associate tree
    • topmost inside nodes
    • let be the number of times returned for the subtree of topmost node
    • running time for one search is time for each topmost inside node , where is the number of points in that are reported.
  • Two topmost inside nodes have no common point in their trees every point is reported in at most one associate structure

Time for range search in range-tree is proportional to

It is since it’s 2 dimensional.

Time for range search in range tree:

Range Tree Space Analysis

Basically to know the space taken for a range tree, we can think of it as the number of times each node appears in an associated tree which then equal to {ancestors of v } = . Space is

Higher Dimensions

Range trees can be generalized to d-dimensional space.

Range tree: Takes more space but better in range search, kd-Trees have less space but worse range search.

Summary

For Quad-tree, it can be space inefficient when two points are close to each other. Since

Questions

In BST range search, there are always distinct inside nodes?

  • False??

The best-case runtime of a 1-dimensional range search on a balanced binary search tree with nodes is:

  • Worst case?
  • , so it is assumed that the BST is balanced, the run-time of the range search is hence . but if (i.e. all points are in range) then you can get a worse case runtime. Now my question is how is it ??
  • I’ve asked on piazza

Suppose in 2D range tree storing n points the primary tree is not required to be balanced (ie. not required to have height O(log n)) but all associated trees are still required to be balanced. Does the worst-case asymptotic change? Give the largest possible number of nodes in a range storing n points. Briefly explain

  • It will remain the same?