Thursday, March 14, 2013

PMA: The Packed Memory Array

Thanks to Deepak & Gaurav for proof-reading this post and providing useful feedback!

The Packed Memory Array (pdf) is an interesting structure to say the least, since it keeps its elements almost contiguously (and in sorted order) in memory with no additional indirection via pointers like a usual array and still allows one to perform fast - O(log2N) - updates to the array.

We know that inserting elements in sorted order into an array without keeping gaps between consecutive elements costs O(n) per insert, whereas searching for an element can be done using binary search with a cost of O(log n). These are tight upper bounds, but the story is a little different for randomly arranged data. If one is inserting random data into a sorted array with gaps being maintained between consecutive elements, the expected time to insert a single element magically falls to O(log n)! Now, what just happened here? To find out, read more in Bender, Colton, and Mosteiro's paper titled Insertion Sort is O(n log n) (pdf). On the other hand, if we don't permit gaps between elements, even for random data being inserted, the amortized cost for inserting n elements into an array in sorted order is O(n2) - why? (hint: Refer to the expected case analysis of quick-sort).

The simple idea is to not pack all elements together, but to maintain some gap between consecutive elements. We shall see that if we follow this simple idea, then the cost for insertion falls to O(log2n) amortized worst-case. This is the packed-memory-array (PMA). We however need to formalize the idea a bit and set some rules of the game before we get ahead of ourselves.

We'll start off by assuming that we already have a PMA that holds n valid elements. One of the invariants we have for the PMA is that it should be more than 0.25x full (this is called the fullness threshold). i.e. If the PMA has space for 4n elements, then there should be at least n actual elements in the PMA. Any less and we should re-size the PMA to have space for 2n (not n) elements (this is also part of the fullness threshold). The reason we maintain extra space in the PMA is so that we can re-balance and that re-balances involving a lot of elements won't happen too frequently.

Let's just focus on insertions for now. The PMA is organized as a contiguous array of slots which might be used or free. Conceptually, we break this array of size N into N/log N blocks, with each block holding log N elements. We'll see why this is helpful. If we look at a PMA as being made up of such blocks of size log N each, then we can view the PMA as binary tree (conceptually) with each level having different fullness thresholds. The algorithm for inserting elements relies heavily on the upper density threshold whereas the algorithm for deleting elements relies heavily on the lower density thresholds. For the sake of brevity, I shall only discuss insertion (not deletion).

Algorithm: When we insert elements into the PMA, we follow these steps:
1. Locate the position to insert the element into. We either know this before-hand or we perform a binary search which costs O(log2N).
2. If the cell we want to insert into is free, we just add the element, and mark the cell as used. We are done!
3. If however, the cell is used, we compute the upper density threshold for the smallest block (of size log N) that the desired cell falls within, and check if the upper density threshold would be violated. If we notice that there is no violation, we just re-balance all elements including the new one into that block. We are done. If we violate the upper density threshold, we consider a block twice as large (which includes the cell we will be inserting into) and check if the density threshold is violated. We repeatedly move up till we find a chunk for which the upper density threshold is not violated.
4. If we fail to find such a chunk, we just allocate an array twice as large and neatly copy all the existing elements into the new array with constant sized gaps between elements!
Analysis: We analyze the cost to insert an element into the PMA.

Pre-requisite: Weight balance for fun and profit
1. The upper (and lower) density thresholds are arranged so that they grow arithmetically from the top (root) level to the bottom (leaf) level.
2. The difference in density thresholds is 0.5, and we have log N levels, so if we want to maintain a constant arithmetic difference between levels, there must be a difference of 0.5/log N between each level. This is a difference of Δ = O(1/log N) between each level.
3. Q. What is number of elements that should be inserted at a certain level to bring it out of balance?
A. Clearly, if a level has room for N elements, and it is out of balance then that could have happened only if it went from being in balance to now out of balance, which means that O(ΔN) elements were inserted into this level.
4. Q. What is the number of element moves we need to bring a level back into balance?
A. If a level is out of balance, we typically go up till a level within density thresholds is located and re-balance it. Ideally, going up one level should do the trick, so to re-balance a level containing N elements, Θ(N) operations are sufficient.
5. Therefore, the amortized cost to re-balance a level is O(N / ΔN) = O(log N).
6. However, we must not forget that an element insertion affects the thresholds of O(log N) levels, which means that the actual cost for insertion is O(log2N).
You can also read about the analysis in section-4 of another paper.

Q1. What if we use space proportional to Θ(Nc) (assume c=2) to store N elements? What happens to the running time for insert?

A1. Well, it just goes over the roof since you're now going to be moving elements across a lot of unused cells while you re-balance the array. Additionally, you'll also need to to adjust your level density thresholds to not be arithmetically increasing, but geometrically increasing. Instead, if you use tagging and maintain elements as tags and pointers to actual values, you can get better running times if the tag space is polynomial (Nc) in the number of elements in the structure.

Q2. Is the PMA a cache-oblivious data structure?

A2. The PMA is Cache Oblivious, and is used as a building block in other more complex external memory data structures such as the Cache-Oblivious B-Tree.

Implementation: You can find a sample implementation of the PMA here.