Sunday, April 21, 2013

Inside the guts of Kadane's algorithm OR Understanding the Maximum Sum Subarray Problem

Kadane's algorithm is used to solve the 1 dimensional Maximum Sum Sub-array Problem in computer science.

Let's try to understand how it really works. If we are given the problem of finding the maximum sum sub-array of a given array, the first native approach we can try is the O(n2) algorithm of starting at every array index and computing the sum from that index to every index after it. This works, and gives us the correct answer, but we should ask ourselves if we can exploit certain properties that the problem might have to try and speed up the solution.

Let's try and dig deeper into an example.

Consider the following array & its corresponding cumulative sum subarray:

Cumulative Sum105310116359612-9-4-6-5

Some observations:
  1. A maximum sum sub-array can never have a negative number as one of the elements at the end-points, except of course if every element in the array is a negative number. If a maximum sum sub-array had a negative number on one of its end-points, we could remove that element and increase the value of the sum, thus getting a sub-array with a larger sum. Conversely, a maximum sum sub-array always starts and ends with a non-negative number, unless of course all the numbers in the array are negative.

  2. We can always clump all negative and non-negative numbers together since once we encounter a negative number, the sum always drops and once we encounter a non-negative number, the sum never drops.

  3. If the running sum of a sub-array array ever falls below zero, no solution will ever include the negative number that caused this sum to fall below zero since it over-powers the positive sum accumulated before it. Note: We only speak of 1 negative number because of the clumping point above.

  4. An extension of the point above implies that the new running sum that begins once a cumulative sum falls below zero always starts from the immediately following non-negative number.
Applying the rules above, we get the following recurrence:

MaxSum[i] is the Maximum Sum ending at index i.

MaxSum[i] = Array[i]                if i = 0
MaxSum[i] = MaxSum[i-1] + Array[i]  if MaxSum[i-1] + Array[i] >= 0
MaxSum[i] = 0                       if MaxSum[i-1] + Array[i] < 0

The re-written array (which now consists strictly of alternating negative and non-negative numbers) and the new cumulative sum sub-array are (the row "Cumulative Sum" below represents the MaxSum variable above):

Cumulative Sum10311396120534