Pre-requisite: This note on Amortized Analysis.

We'll use weight balance to implement a dictionary structure and examine how the guts of one such structure, the weight-balanced tree work.

A dictionary data structure is one that supports the following dictionary data structure operations:

- Insert
- Delete
- Find
**Predecessor****Successor**

Now, you might have heard of the following data structures that (efficiently) support the operations mentioned above:

It would surprise (or maybe not) you to know that both these structures work on the principle (guess) of weight-balance!!

So what exactly do we mean when we talk about the

*weight*of a sub-tree in a BST? Well, as it turns out, the weight of the sub-tree in a BST is just the count of the number of nodes in the sub-tree rooted at that node (including the node itself).

For example, the following tree (image courtesy wikipedia) has a weight of 9

A weight-balanced tree rooted at node

*u*is one in which (either):

- The weights of the left and right children of a sub-tree are within constant factors of each other:

**weight(Left-Child(u)) + 1 = Θ(weight(Right-Child(u) + 1)**

Note that the**+1**is important for pedantic reasons as far as the order-notation is concerned.

OR - The weights of the left and right children of a sub-tree are within constant factors of the weight of the complete sub-tree

**weight(Left-Child(u)) + 1 = Θ(weight(u) + 1) AND**

weight(Right-Child(u)) + 1 = Θ(weight(u) + 1)

More realistically, if we stick to the second definition, we have:

**weight(Child(u)) + 1 ≥ 0.25 * (weight(u) + 1) AND**

weight(Child(u)) + 1 ≤ 0.75 * (weight(u) + 1)

weight(Child(u)) + 1 ≤ 0.75 * (weight(u) + 1)

where, Child(u) denotes both the left & right child of

*u*.

For example, if we consider the following example tree, which is clearly out of weight-balance (don't ask me how we got there because this example is made-up), we re-balance it to be perfectly in balance (if we have an odd number of nodes or almost perfectly balanced otherwise).

We should be careful about

*how*we re-balance these

*N*nodes, because if the cost is any worse than Θ(N), then we won't get the update costs that we desire. The easiest way to perform the re-balance with a cost of Θ(N) is to perform an in-order traversal of the subtree rooted at node

*u*, and write out the sorted nodes to an array. We can then re-create a perfectly balanced BST from that array either using recursion or the Day–Stout–Warren algorithm.

*This is where the fun starts!!*

Q 1. How many nodes need to be inserted under a sub-tree rooted at node

*v*to bring the sub-tree out of balance (assuming it is perfectly balanced to start off with)? Let's assume that the sub-tree originally contains

*N*nodes.

A. You need to insert some constant fraction of the weight of that sub-tree! which is

**Ω(N)**.

Q 2. What is the cost to rebalance the sub-tree rooted at node

*v*if we know that that sub-tree has a weight of

*N*?

A. Well, we already answered this above. The answer is

**Θ(N)**.

Q 3. How many sub-trees potentially go out of balance when you insert a node?

A. We know that a node is inserted at the leaf level, so potentially

**all**the sub-trees that are rooted at the nodes on the leaf-to-root path with the newly inserted node as the leaf node can potentially go out of balance. This happens to be

**Θ(log N)**nodes.

∴ the

**amortized**cost to insert a new node into the balanced tree is:

**Ω(N)/Θ(N) * Θ(log N) = Θ(log N)**.

Now, that's a fairly straight-forward algorithm to get the same (amortized) costs as the worst-case costs for updates with a more complicated beast such as an RB-Tree or an AVL-Tree. Though, I feel that Treaps are much simpler to implement.