Sunday, October 12, 2014

Static To Dynamic Transforms or: How I learnt to stop worrying and love static data structures

Gaurav in his blog post describes in great detail what a static to dynamic transform is, when it is applicable, how to dynamize a static data structure, and the costs of inserting and looking up values in a dynamized data structure (relative to the costs in the corresponding static structure). I'll skip all of that since it's been presented so well in the link above, but will give a short description of the essentials.
  • What is a static data structure? A static data structure is one where it isn't computationally efficient to insert a new element, and typically involves re-building the whole structure to be able to add just one element. e.g. A sorted array. If you want to keep an array of elements sorted, then inserting a single element could involve shifting all the elements one place to the right.
  • What is a dynamic data structure? A dynamic data structure is one where adding a single element is computationally efficient, and doesn't involve touching every element in the data structure. e.g. A height-balanced binary search tree. Inserting a single element in a height-balanced binary search tree involves O(log n) node rotations. See this page to read more about the differences between static and dynamic data structures.
  • What is the amortized insertion time to insert an element in a dynamized sorted array? Inserting a single element into a dynamized sorted array costs O(log n) per insertion.
  • How much extra space do you need to dynamize a sorted array? You need O(n) extra space to dynamize a sorted array, since you need intermediate storage space when merging parts (levels) of the dynamized data structure.
  • What is the query time in a dynamized sorted array? Searching for an element in a sorted array costs O(log n), whereas searching in a dynamized sorted array costs O(log2 n).
We can see that the overhead of dynamizing a sorted array is something we can live with. In fact, it's almost unbelievable that we can dynamize a sorted array by paying only as much as an O(log n) overhead per insertion, and an O(log n) overhead per query.

Static to Dynamic transforms in practice: Consider that you're working with an inherently static data structure such as the SSTable (Sorted String Table), and that your system must maintain all its data as part of some SSTable. In such a case, inserting even a single row means that you need to rebuild the new SSTable which contains all the rows from the previous SSTable plus the newly inserted inserted row. This obviously means that the cost of inserting a single row can be linear in N, N being the number of elements in the newly created SSTable. This is extremely undesirable since it means that inserting N elements into the system will cost O(N2).

This is exactly where the Static To Dynamic Transform comes in super handy. You just apply the transform, and almost magically, you've gone from an overall running time of O(n2) down to O(n log n) for inserting n elements into the data structure.

No comments: