Pre-requisite: This note on Amortized Analysis.
We'll use weight balance to implement a dictionary structure and examine how the guts of one such structure, the weight-balanced tree work.
A dictionary data structure is one that supports the following dictionary data structure operations:
- Insert
- Delete
- Find
- Predecessor
- Successor
Now, you might have heard of the following data structures that (efficiently) support the operations mentioned above:
It would surprise (or maybe not) you to know that both these structures work on the principle (guess) of weight-balance!!
So what exactly do we mean when we talk about the weight of a sub-tree in a BST? Well, as it turns out, the weight of the sub-tree in a BST is just the count of the number of nodes in the sub-tree rooted at that node (including the node itself).
For example, the following tree (image courtesy wikipedia) has a weight of 9
A weight-balanced tree rooted at node u is one in which (either):
- The weights of the left and right children of a sub-tree are within constant factors of each other:
weight(Left-Child(u)) + 1 = Θ(weight(Right-Child(u) + 1)
Note that the +1 is important for pedantic reasons as far as the order-notation is concerned.
OR - The weights of the left and right children of a sub-tree are within constant factors of the weight of the complete sub-tree
weight(Left-Child(u)) + 1 = Θ(weight(u) + 1) AND
weight(Right-Child(u)) + 1 = Θ(weight(u) + 1)
More realistically, if we stick to the second definition, we have:
weight(Child(u)) + 1 ≥ 0.25 * (weight(u) + 1) AND
weight(Child(u)) + 1 ≤ 0.75 * (weight(u) + 1)
where, Child(u) denotes both the left & right child of u.
For example, if we consider the following example tree, which is clearly out of weight-balance (don't ask me how we got there because this example is made-up), we re-balance it to be perfectly in balance (if we have an odd number of nodes or almost perfectly balanced otherwise).
We should be careful about how we re-balance these N nodes, because if the cost is any worse than Θ(N), then we won't get the update costs that we desire. The easiest way to perform the re-balance with a cost of Θ(N) is to perform an in-order traversal of the subtree rooted at node u, and write out the sorted nodes to an array. We can then re-create a perfectly balanced BST from that array either using recursion or the Day–Stout–Warren algorithm.
This is where the fun starts!!
Q 1. How many nodes need to be inserted under a sub-tree rooted at node v to bring the sub-tree out of balance (assuming it is perfectly balanced to start off with)? Let's assume that the sub-tree originally contains N nodes.
A. You need to insert some constant fraction of the weight of that sub-tree! which is Ω(N).
Q 2. What is the cost to rebalance the sub-tree rooted at node v if we know that that sub-tree has a weight of N?
A. Well, we already answered this above. The answer is Θ(N).
Q 3. How many sub-trees potentially go out of balance when you insert a node?
A. We know that a node is inserted at the leaf level, so potentially all the sub-trees that are rooted at the nodes on the leaf-to-root path with the newly inserted node as the leaf node can potentially go out of balance. This happens to be Θ(log N) nodes.
∴ the amortized cost to insert a new node into the balanced tree is:
Ω(N)/Θ(N) * Θ(log N) = Θ(log N).
Now, that's a fairly straight-forward algorithm to get the same (amortized) costs as the worst-case costs for updates with a more complicated beast such as an RB-Tree or an AVL-Tree. Though, I feel that Treaps are much simpler to implement.