kd-Tree (Partition Trees)

  • More like binary tree. Since we split region by 2 alternating between vertical and horizontal splits regardless of where points are. Until every point is in a separate region. This way of splitting gives range search to be more efficient.
  • So each kd-tree keeps track of splitting line in one dimension.

kd-tree example:

  • No bounding box
  • Root corresponds to the whole
  • First find the best vertical split
  • on one side and and points on the other
  • Recurse on the resulting regions (if they have more than one point)
  • Alternate split direction

For example at the root, we have , then if the answer is Yes, we go left, if the answer is No, we go left. Next, we split vertically. And so on. We always split at a point.

Is the question at a node always ?

Constructing kd-trees

We build a kd-tree by initially splitting by x on points S:

  • If create a leaf and return (only one point)
  • Else (select by x-coordinate)
  • Partition by x-coordinate into and
    • a points on one side and points on the other.
  • Create left subtree recursively (splitting by y) for points .
  • Create right subtree recursively (splitting by y) for points . Building with initial y-split symmetric.

Run-time:

  • Find and partition in expected time using [randomized-quick-select], basically Quickselect.
  • Both subtrees have points.

This resolves to expected time

  • This can be reduced to [worst-case] time by pre-sorting (no details).

Height: , .

  • This resolves to (specifically ).
  • Apparently this is not the expected height? You always take the median, so the height is deterministic. What does that mean (piazza 1233)

kd-tree Dictionary Operations

  • (for single point): as in binary search tree using indicated coordinate
  • : search, insert a new leaf
  • : search, remove leaf

Problem:

  • after insert and delete, the split might no longer be at exact median and the height is no longer guaranteed to be .
  • We can maintain height by occasionally re-building entire subtrees. (No details.)
  • kd-tree does not handle delete and insert well.
  • Range search is exactly as for quadtrees, except that there are only two children (binary!!).

  • We assume again that each node stores its associated region, (refer to above kd-tree example)

  • To save space, we could instead pass the region as a parameter and compute the region for each child using the splitting line.

Range Search Complexity:

  • The complexity is where is this the same as ? Yes apparently from piazza.
    • is the output-size
    • is the number of “boundary” nodes (blue):
      • was called.
      • Neither nor
  • [Can show:] satisfies the following recurrence relation (no details):
  • This solves to the complexity of range search in kd-trees is (that is if it’s 2-d)

For running time, enough to count blue nodes and add .

kd-tree: Higher Dimensions

Questions

What is the runtime of running a range search query on a 4-Dimensional kd-tree?

  • The range search time is , so if , then it becomes

The storage requirements for kd-trees in d-dimensional space is linear in n.

  • True, said so in slides. To remember

It is impossible to design a comparison-based algorithm to build a kd-tree of size in worst case time .

  • True, violates the lower bound that says that any comparison-based sorting algorithm must make at least comparisons to sort elements.

The height of a 2-dimensional kd-tree is never larger than the height of a 2-dimensional quad-tree where both trees store the same 2-dimensional points in general position (i.e. distinct and coordinates) and .

  • False, because a kd-tree splits into 2 subtrees, while a quad-tree splits into 4 subtrees, so it’s possible that the quad-tree will still have a smaller height.

The height of a kd-tree for points, where is a power of 2, in general position satisfies the recurrence

At each level, the number of points is halved, and the height of the tree increases by 1. This is why the recurrence holds true for the height of a kd-tree with points, when is a power of 2.

Suppose satisfies and , where is a power of . Then:

  • We can start by noticing the pattern in the recurrence.
  • Given the recursive formula, we have:
  • As we can see, the values of Q(n) seem to be doubling with each step when is a power of 4. Specifically, , where is the number of times we can divide by 4 until we reach 1. This is the same as the logarithm of to the base 4.
  • Thus, the closed-form expression for is:

Suppose during a range search on a 2-dimensional kd-tree with points that only 7 points were found within the given query rectangle. In the [worst case], the runtime to perform this range search is:

  • , used the range search time on .