Wednesday, February 18, 2015

Searching faster than Binary Search

We all know that Binary Search is asymptotically the fastest way to search for data within a sorted (ordered) array in the comparison (or simple decision tree) model assuming nothing in particular about the data or the data types.

However, there's an entire class of algorithms and data structures that focus on how efficiently they utilize the host system's cache while processing data. These class or algorithms and data structures are called cache efficient algorithms and data structures [pdf].

There's 2 types of cache efficient algorithms and data structures:
  1. One class tunes the algorithm for a particular size of cache or cache hierarchy. The algorithm is aware of the cache sizes, the number of caching levels, and the relative speeds of each of these caching levels.
  2. Another class of algorithms is oblivious of the underlying cache sizes and layouts and are provably optimal for any cache size and caching layout (sounds magical doesn't it!). These are called Cache Oblivious Algorithms. See this link on Cache Oblivious Algorithms for more details, and this link for more details on the model, and the assumptions made in the Cache Oblivious model.

  • An example of a cache-efficient algorithm that is also cache-oblivious is Linear Search.
  • An example of a cache-inefficient algorithms that isn't cache-oblivious is Binary Search.
  • An example of a cache-efficient data structure that isn't cache-oblivious is the B-Tree (since B is the tuning parameter for the particular machine on which we arr running).
Without getting into the details, the complexity of running Binary Search on an array in the Disk Access Model (DAM) (where we are only concerned about the number of disk blocks read, and not the number of comparisons made) is O(logNB)), since we must always load a block from disk till we reach a small enough array (of size B) such that no more jumps within that array will trigger another disk I/O operation to fetch another disk block. The optimal complexity for searching for ordered data on disk is realized by ordering the data recursively in a static search tree such that the complexity reduces to O(logBN).

However, implementing that structure is somewhat complicated, and we should ask ourselves if there is a way to get the best of both worlds. i.e.
  • Runtime efficiency of the cache-oblivious recursive layout, and
  • Implementation simplicity of the standard Binary Search algorithm on an ordered array

Turns out, we can reach a compromise if we use the square-root trick. This is how we'll proceed:

  1. Promote every sqrt(N)'th element to a new summary array. We use this summary array as a first-level lookup structure.
  2. To lookup an element, we perform binary search within this summary array to find the possible extents within the original array where our element of interest could lie.
  3. We then use binary search on that interesting sub-array of our original array to find our element of interest.




For example:

  • Suppose we are searching for the element '7', we will first perform binary search on the top array; i.e. [10,22,33,43] and identify that 7 must lie in the original values sub-array at an index that is before the index of 10. We then restrict our next binary search to the sub-array [5,7,8,10].
  • Suppose we are searching for the element '22', we first identify the sub-array [11,21,22,25,26,33] as potentially containing a solution and perform binary search on that sub-array.

Even though we are asymptotically performing the same number of overall element comparisons, our cache locality has gone up, since the number of block transfers we'll perform for a single lookup will now be 2(log√NB), which is an additive factor of log B lesser than what we had for our normal binary search on the sorted array.


You can find another similar idea named Cache Conscious Binary Search described here.

All the code needed for reproducing the results shown here can be found below:


Sunday, February 15, 2015

The many small steps to Pincha Mayurasana (पिंच मयूरासन ) (Feathered Peacock Pose)

Pincha Mayurasana (or Feathered Peacock Pose) is a yoga pose that roughly translates to a forearm stand. This can be the next step after mastering (or getting decent at) the Headstand (Shirshasana).

You'll need some tools to be able to start practicing the feathered peacock pose. There are:

  1. Forearm strength
  2. Shoulder strength
  3. Lower Back strength
  4. Lower Back flexibility (to arch into a slight concavity)
  5. General balance (that you'll have developed as part of the headstand practice)
  6. Some Core strength (that you'll have developed as part of the headstand practice)

Here are some posts explaining (in great detail) how to attain the final pose and possibly some of the intermediate steps needed to be practiced on the way to getting there:
  1. Baby Steps to Forearm Stands
  2. Pincha Mayurasana (Two Fit Moms)
  3. Feathered Peacock Pose (Yoga Journal) and the video
  4. Benefits of Pincha Mayurasana
Here are some excellent videos that you should watch to get a visual introduction to the pose:






Let me now discuss some of the extra bits I would like to add or stuff I would like to emphasize from the posts and videos above:


  1. Practicing the Cobra Pose (Bhujangasana) was helpful in opening up my lower back
  2. Practicing the Locust Pose (Salabhasana) (or even doing one leg at a time) was extremely helpful in developing lower back strength
  3. Forearm and shoulder strength can be developed by practicing the Peacock Pose (Mayursana) and by using wrist strengtheners such as a Powerball
  4. When I started, I used a wall to support me when I kicked up from the dolphin pose
  5. Forearm strength, shoulder strength, and general flexibility can be developed by practising multiple rounds of Surya Namaskaar
  6. Andrei Ram Om's video on the Feathered Peacock pose (above) touches up on some subtle but important aspects about the back position (arched) and the differences between various hand positions. I think it's extremely relevant to know these differences when you start so that you can experience the full flavour of the pose

On Shanti, and keep practising! Namastey!


Tuesday, February 10, 2015

Smallest multiple with constraints

Problem statement:

Given a number 'n', design an algorithm that will output the smallest integer number 'X' which contains only the digits {0, 1} such that X mod n = 0 and X > 0 (1 ≤ n ≤ 100000)

Problem credits: Ayush on the Stony Brook Puzzle Society Facebook Group

Insight: Suppose n = 6. Consider some numerator (target answer X) to be 1101. Can this be correct? We can verify by dividing and checking the remainder. 1101 mod 6 = 3, so we've overshot the answer by 3 or more (since we might not have the smallest X).

Another way to get the same remainder '3' is to say:
  • 1000 mod 6 = 4
  • 100 mod 6 = 4
  • 00 mod 6 = 0
  • 1 mod 6 = 1
Add up all the remainders and take that mod 6. Hence, (4 + 4 + 0 + 1) mod 6 = 9 mod 6 = 3.

Which basically means that we want to find that subset of powers of 10, added together (mod n) leave a remainder of 0. We can solve this using dynamic programming using a technique similar to the one used to solve the subset sum problem. The solution to our problem (and to the subset sum problem) is pseudo-polynomial. Specifically, our problem is solved in time O(n log X).

Here is the ideone link, and below is the code (in python) to solve it.

Follow up problem:

Find a number 'X' given an 'n' (similar to the problem above) except that you are allowed to only use digits {0, 3, 5, 8}. Can you always find a solution for any 'n'?

Monday, January 19, 2015

How to make an intense cup of hot chocolate

tl;dr Find a way to melt chocolate; then add milk and sugar. Drink up (credits to Yazhini for this succinct description).

Experiment: One of the most intense hot chocolates you've ever had

Ingredients -- You will need (for 1 cup hot chocolate [240ml]):

  1. 250 ml whole milk
  2. 4-6 squares of dark chocolate (I use Ghirardelli)
  3. A ceramic cup
  4. 2 tablespoons of water
  5. (optional) 1/2 teaspoon sugar
Procedure:
  1. Pour the whole milk into a saucepan and heat till it begins to simmer
  2. Now take the chocolate squares and place them in the ceramic cup. Heat in a microwave for ~20 second at medium power
  3. Pour 2 tablespoons of water on the chocolate and heat in the microwave for 30 second at medium power
  4. Stir the chocolate and water well to make it a thick homogeneous mixture
  5. Pour the simmering milk into the ceramic cup, add sugar if needed, and stir well
  6. Optionally strain the hot chocolate to rid it of any thin layer of cream that may have formed while simmering the milk
  7. Drink and enjoy!
Observation:
One happy person (i.e. you)

Inference:
Hot chocolate makes you happy :)

Saturday, November 01, 2014

Macho-ism in Computer Science

It's common for me to see blog posts by companies talking about the high traffic volume in terms of QPS/RPS they handle and the amount of data they process, and that's super cool. But then there's another class of facts I see floating around a lot, and they tend to talk about the size of their hadoop or serving cluster, and things like "dozens or hundreds" of machines in your fleet seem to be something to be proud of. I don't understand this way of thinking or at least don't see the point of it. As I see things, it's nicer if you can get more done with fewer machines, and not more machines in your fleet.

Sunday, October 12, 2014

Static To Dynamic Transforms or: How I learnt to stop worrying and love static data structures

Gaurav in his blog post describes in great detail what a static to dynamic transform is, when it is applicable, how to dynamize a static data structure, and the costs of inserting and looking up values in a dynamized data structure (relative to the costs in the corresponding static structure). I'll skip all of that since it's been presented so well in the link above, but will give a short description of the essentials.
  • What is a static data structure? A static data structure is one where it isn't computationally efficient to insert a new element, and typically involves re-building the whole structure to be able to add just one element. e.g. A sorted array. If you want to keep an array of elements sorted, then inserting a single element could involve shifting all the elements one place to the right.
  • What is a dynamic data structure? A dynamic data structure is one where adding a single element is computationally efficient, and doesn't involve touching every element in the data structure. e.g. A height-balanced binary search tree. Inserting a single element in a height-balanced binary search tree involves O(log n) node rotations. See this page to read more about the differences between static and dynamic data structures.
  • What is the amortized insertion time to insert an element in a dynamized sorted array? Inserting a single element into a dynamized sorted array costs O(log n) per insertion.
  • How much extra space do you need to dynamize a sorted array? You need O(n) extra space to dynamize a sorted array, since you need intermediate storage space when merging parts (levels) of the dynamized data structure.
  • What is the query time in a dynamized sorted array? Searching for an element in a sorted array costs O(log n), whereas searching in a dynamized sorted array costs O(log2 n).
We can see that the overhead of dynamizing a sorted array is something we can live with. In fact, it's almost unbelievable that we can dynamize a sorted array by paying only as much as an O(log n) overhead per insertion, and an O(log n) overhead per query.

Static to Dynamic transforms in practice: Consider that you're working with an inherently static data structure such as the SSTable (Sorted String Table), and that your system must maintain all its data as part of some SSTable. In such a case, inserting even a single row means that you need to rebuild the new SSTable which contains all the rows from the previous SSTable plus the newly inserted inserted row. This obviously means that the cost of inserting a single row can be linear in N, N being the number of elements in the newly created SSTable. This is extremely undesirable since it means that inserting N elements into the system will cost O(N2).

This is exactly where the Static To Dynamic Transform comes in super handy. You just apply the transform, and almost magically, you've gone from an overall running time of O(n2) down to O(n log n) for inserting n elements into the data structure.

Saturday, October 11, 2014

Deamortizing disk writes during log rotation

According to this page on deamortization, "deamortization refers to the process of converting an algorithm with an amortized bound into one with a worst-case bound."

Log Rotation refers to the automated process used in system administration in which dated log files are archived. logrotate is one such UNIX command that lets you perform log rotation.

Many services (daemons in the UNIX world) write out log files that need to be periodically rotated and compressed. This is achieved using the logrotate command (mentioned above). Certain tools such as multilog also assist in augmenting the log with extra information such as the timestamp, etc...

When log files are rotated and compressed, it ends up reading the complete log file from disk (if not present in the kernel caches), and using a lot of CPU for the duration of compressing the log file. Depending upon the size of the log file, this could take anywhere from a fraction of a second to 10s of seconds. If the log file is large (which is usually the case), it ends up starving other processes that need to use the disk or CPU. If the supervising process (performing the job of log rotation and compression) is synchronously reading from the daemon service's stdout/stderr, and is taking too long to rotate/compress the logs, then it will block writes to the daemon's stdout/stderr file descriptors since the pipes between the supervisor and the daemon will be full.

This can be mitigated by deamortizing the costs of log rotation/compression by writing out the compressed file incrementally instead of compressing the uncompressed log file when it is rotated and a new log file is started. This not only deamortizes the I/O and CPU costs, but also ensures that we log a lot lesser in terms of bytes (on disk). This is because we were earlier writing out the uncompressed file and subsequently compressing it and writing the compressed log file, whereas we will now only write out the compressed file (eliminating writes of the uncompressed file, which is a large fraction of total log file writes). If we write out just the compressed log file incrementally, we're saving greatly on the IO cost associated with writing out the uncompressed log file. This simple change can go a long way in helping you scale your services to be able to handle a larger number of overall requests.