Tuesday, November 24, 2009

An O(1) approach to the LFU page replacement algorithm

LFU: Least Frequently Used

The LFU algorithm evicts pages that have been least frequently used. If the cache is full, and one more page needs to be added, the page that has been used the least number of times is evicted from the cache.

How can we efficiently implement this algorithm? Each operation of the LRU algorithm can be implemented in worst case O(1) complexity. We can do the same for LFU as well.

For this, we maintain a doubly linked list where each node of the linked list looks like this:

struct CountNode
{
CountNode *prev, *next;
PageNode *root;
int count;
};

Each node of this linked list points to the root of another doubly linked list which holds the actual pages. All the pages in this 2nd linked list have the same usage count. The node for this linked list looks like this:

struct PageNode
{
PageNode *prev, *next;
CountNode *parent;
char pageData[4096];
};



The PageNode's parent member points to that CountNode of which it is a child. So there is a direct pointer from every PageNode to the corresponding CountNode it falls under. These extra links are not shown in the diagram to keep it simple.

The CountNode linked list's count member holds the usage count of each element under that node. So each PageNode under a CountNode having a count value of 6 will have been accessed 6 times.

How to implement the various operations?
  1. Insetion: To insert, we always add the new PageNode to the CountNode list with the count value of 1. This will always be the first node in the linked list pointed to by head. If the first CountNode pointed to by head is greater than 1, we just add a new CountNode with a count value of 1 and then perform the insertion. All this can be done in time O(1).

  2. Deletion: We will delete nodes, only if the cache is full, and this deletion will always happen from the CountNode list which has the least count. So, we just follow head, and delete a PageNode in such a CountNode. Additionally, if that CountNode happens to become empty because of this deletion, we also delete that CountNode. This too takes time O(1).

  3. Access: Access involves incrementing the count of a PageNode, i.e. moving the accessed PageNode from it's CountNode list to a CountNode list with a count value that is 1 greater than it's current count value. We maintain pointers to the CountNode in the PageNode. This makes deleting the PageNode(and possibly the CountNode to which it belonged) a constant time operation. After that is done, we can insert a new CountNode that has a count value of 1 more than the CountNode from which the accessed PageNode was removed. If such a CountNode does not exist, we can always create it first. All this can also be done in constant time, making the whole process O(1).

    A lot of people have asked me how they can get to a PageNode in the first place. The answer to that question is that we maintain a hash table which hashes an object(page address, integer, etc...) such that accessing it (getting to the PageNode that that object corresponds to) can be accomplished in O(1) time.


This implementation takes advantage of the fact that:
  1. When we add a new object to the cache, it is always going to start off with a use count of 1, and

  2. Whenever we access an object, it's use count will go up by exactly 1.


I haven't found any resource that says that LFU can be implemented in time O(1). If you (reader) find any, please do let me know.

Saturday, November 21, 2009

The Missionaries and Cannibals Problem

Some reading before you start: http://en.wikipedia.org/wiki/Missionaries_and_cannibals_problem

Problem Statement:
[From the link above] "In the missionaries and cannibals problem, three missionaries and three cannibals must cross a river using a boat which can carry at most two people, under the constraint that, for both banks, if there are missionaries present on the bank, they cannot be outnumbered by cannibals (if they were, the cannibals would eat the missionaries.) The boat cannot cross the river by itself with no people on board."

Solution:
We will solve the generalized version of this problem, wherein there are C cannibals and M missionaries, and at most P people are allowed on the boat simultaneously.

This problem can be solved by noticing that at every stage, we can encode the state of the problem by using 3 variables, (nc, nm, shore). nc is the number of cannibals, nm is the number of missionaries and shore is either 0 or 1, which signifies whether nc and nm represent the count on the source or the destination shore respectively.

Thus, there are at most 2 x n2 states that the problem can be at any point in time.

We can use BFS (breadth first search) to perform an exhaustive search on the search space to find a solution if one exists. The run-time complexity of doing so is O(CM P2). Below is the Python2.5 code that solves this problem using the BFS search method.



import sys
import string
from collections import deque


states = [[], []]
dq = deque()
nc = 0
nm = 0
c = 0
combinations = [ ]


def gen_combinations():
global c, combinations

for i in xrange(0,c+1):
for j in xrange(0,c+1):
if i+j>0 and i+j<=c:
combinations.append((i,j))


def init_states():
global nc, nm, states
for i in xrange(0, nc+1):
states[0].append([])
states[1].append([])
for j in xrange(0, nm+1):
states[0][i].append(-1)
states[1][i].append(-1)


def is_valid_state(nc, nm):
return (nm>=nc) or (nm==0) or (nc==0)


def bfs():
global dq, states, nc, mn, c, combinations

while (len(dq) > 0) and states[1][nc][nm] == -1:
(x,y,depth,boat) = dq.popleft()

present = boat
next = 1-boat

if states[present][x][y] != -1:
continue

states[present][x][y] = depth

for comb in combinations:
(dx,dy) = comb

if x>=dx and y>=dy:
new_st = (nc-(x-dx),nm-(y-dy),depth+1,next)
(lx,ly) = (nc-(x-dx),nm-(y-dy))
(rx,ry) = (nc-lx,nm-ly)

if is_valid_state(lx, ly) and is_valid_state(rx, ry) \
and states[next][lx][ly] == -1:
dq.append(new_st)



def main():
global dq, nc, nm, c, combinations, states
input = string.split(raw_input())
nc = int(input[0])
nm = int(input[1])
c = int(input[2])

init_states()
gen_combinations()

dq.append((nc, nm, 0, 0))
bfs()

print states[1][nc][nm]

return 0


if __name__ == "__main__":
sys.exit(main())

Friday, November 20, 2009

The Capacity C Torch Problem

I stumbled upon the Capacity C Torch problem, and found it really interesting.
I'll start off with a few resources:
  1. http://aps.cs.nott.ac.uk/2008/05/06/the-capacity-c-torch-problem-2/

  2. http://www.tutor.ms.unimelb.edu.au/bridge/

  3. http://en.wikipedia.org/wiki/Bridge_and_torch_problem

  4. http://www.springerlink.com/content/935641175550x527/


Problem Definition:
There is a group (G) of people that wishes to cross a bridge in the dark. Each person takes a certain amount of time to cross the bridge. There is only 1 torch available. The bridge can hold at most C people at any one time. Every time a group of people cross the bridge, they need to carry the torch with them. One torch is sufficient to light the path for at most C people at a time. When multiple people cross at the same time, the entire group moves at the speed of the slowest person in the group. Given the values for G, C and the time that each person takes to cross, find the least amount of time that would be required to get the whole group across to the other side.

Expected Complexity of the solution: This solution can be solved using a computer program having a run-time complexity of O(n2), where n is the number of people in the group G [ |G| = n ].

Solution:
Step-1: Let us first concentrate only on how we will divide up the people into sets and in what order we will take them across(from size A to side B). We shall re-visit the problem of which person to get back to bring the torch back from side B to side A.

Let us take some values for the variables G and C.

Example-1: Let |G|=15 and C=5

In this case, we can easily divide people into 3 sets of size 5 each.

We'll sort all the people according to the time they take and group them starting from the slowest person. It can be seen that it is beneficial to club all the slow people together. So, we group the 15th, 14th, 13th, 12th and 11th person in one group, and so on..

We'll take the 1st group(the one with the 5 fastest people) and move them across first. Since someone needs to come back with the torch, it had rather be the quickest person, so we make sure he is across so that he can return.

We then take the 2nd group followed by the 3rd.

Example-2: Let |G|=16 and C=5

In this case, we are left with 1 person after dividing into sets of 5 each. No matter what we do, we will have 4 sets to deal with. We also note that it makes no sense to move just 1 person across. So, the way we make sets is {2,4,5,5}.
Why not divide as {3,3,4,4}? Well, if we do it like the latter case, then the 1st group will have the time of the 3rd person, and not the 2nd like in the former case. In both case, everything else remains the same, except if the 3rd person is slower than the 2nd one, we will have increased the total time required to cross. This is why we optimally divide in the former way.

Example-3: Let |G|=17 and C=5

Again, in this case, we optimally divide as {2,5,5,5}. We note that no other division is possible.

Example-4: Let |G|=18 and C=5

Again, in this case, we optimally divide as {3,5,5,5}. We note that no other division is possible.

Example-5: Let |G|=19 and C=5

Again, in this case, we optimally divide as {4,5,5,5}. We note that no other division is possible.


Step-2: Now, we shall concentrate on the problem of which people should bring the torch back from side B to side A, so that the next group on side A can cross the bridge to side B. Let us call these people torch bearers. The same problem also asks and how many at a time to bring back (i.e. How many torch bearers should move from side B to side A without returning) when sets of people are being transferred from side A to side B.

[This is where you should try figuring out for yourself whether you will send 1, 2, 3, 4, 5 (or more) groups at a time (without the torch bearers returning to side B)]

For example, suppose we have a group of people, [1,1,1,1,1,1,4,5,6,7] and C=3.
It is easy to see that once we have 2 people across, we can always get more across without having the torch bearers coming back to side B.

If we decide to work such that every time 2 torch bearers cross over to side A, they return to side B, we get the following scenario: The mean cost of transferring a set from side A to side B, and back would be 1(1st torch bearer from B to A) + 1(2nd torch bearer from B to A) + 1(1 and 2 back from side A to side B) = 3/1 (since we were able to get only 1 set across).

If we decide to work such that every time 3 torch bearers cross over to side A, they return to side B, we get the following scenario: The mean cost of transferring a set from side A to side B, and back would be 1 + 1 + 1 + 1(1, 2 and 3 back from side A to side B) = 4/2 (since we were able to get 2 sets across).

Similarly, if we decide to work such that every time 5 torch bearers cross over to side A, they return to side B, we get the following scenario. The mean cost of transferring a set from side A to side B would be 5+1(1, 2 and 3 back from side A to side B)+1(1 from side B to side A)+1(1, 4 and 5 back from side A to side B) = 8/4 (since we were able to get 4 sets across).

We observe that the mean cost went down if we took more torch bearers at a time. However, it is also easy to see that this is not always true. It depends entirely on the time that each torch bearer takes to cross the bridge.

We could also have a case where we have more than 3 torch bearers crossing from B to A, without coming back (like we sent 5 torch bearers at once in the example above).

We can NOT make a greedy choice as to how many sets to take across. i.e. How many torch bearers to send from B to A without them returning. This is where dynamic programming comes to the rescue.

Suppose |G|=47 and C=5
We will optimally divide into sets {2,5,5,5,5,5,5,5,5,5}.

The best way to take all 10 sets across is the best of:
  1. The best way to take all across without any torch bearer coming back unless all initial groups are across

  2. The best way to take the first 9 groups plus the best way to take the last group

  3. The best way to take the first 8 groups plus the best way to take the last 2 groups

  4. ...

  5. The best way to take the first group plus the best way to take the last 9 groups

The cost for computing these values using dynamic programming or memoization is O(n2). We can use these techniques because there are many overlapping sub-problems, and the other requirements for dynamic programming match.

We can pre-compute the cost required to take 1, 2, 3, 4, etc... groups at a time and store it in a lookup table. The time required to do this is O(n2).

Thus, we can solve the whole problem in time O(n2).

Sunday, October 25, 2009

Wordlist for GRE

Hello people, I have made a simplistic wordlist tool for GRE preparation. I've been asked by a well-wisher to post it here, so here it is: http://wordlist.dhruvbird.com/.

Saturday, October 17, 2009

Recipe cheat-sheet


Just a ready reckoner for me to look up to figure out how vegetables and lentils are made at my home.




























































































































RecipeFatRai (mustard seeds)JeeraMethi seedsHing (asafoetida)Green chilli (whole)Kadi pattaHaldiSaltGinger pasteDhania and jeera powder to season
Dudhi (bottle gourd)GheeNoYesNoYesYesYesNoYesYesNo
TendliGhee/OilYesNoNoYesYesNoYesYesNoYes
Karela (bitter gourd)[1]GheeYesNoNoYesYesNoYesYesYesYes
Dudhi + Dal (chana or tur daal)[2]GheeYesYesYesYesYesYesYesYesYesNo
Cauliflower (fool gobi)OilYesNoNoYesYesNoYesYesNoYes
Cabbage (patta gobi)[3]OilYesNoNoYesYesNoYesYesNoYes
Tur Daal[4]GheeYesYesYesYesYesYesYesYesYesNo




[1]: Also put jaggery(gud), kaju(cashew) and kishmish(sultanas).
[2]: Also put jaggery(gud), kokam and corriander leaves to season.
[3]: You may optionally add tomatoes halfway through cooking the cabbage, and also add jaggery(gud) if you add tomatoes.
[4]: You can also add jaggery(gud) and kokam. It gives a very good taste to the daal. If you don't want to add gud and kokam, you can add lasun(garlic) to give it a nice taste and aroma. A 3rd option is to add tomatoes. If adding tomatoes, leave out the kokam, but you should add jaggery and may optionally add garlic.

Thursday, October 08, 2009

Online hindi dictionary

I've been trying to find an online hindi dictionary where I can enter the text in english, and the system transcribes it and searches for the meaning. However, the best results I have got are by using google's transliteration tool(http://www.google.com/transliterate/indic) followed by a google search for the resulting word.
That apart, this site: http://www.websters-dictionary-online.org/ seems to have a lot of words. Make sure you choose the "non-english" option when you search.

Sunday, July 26, 2009

Lip-smacking Rasgulla recipe

I have tried for long to make rasgullas that taste like the ones I had a Ganguram's in Kolkata. I tried buying rasgullas from many shops, and tried making on my own in the hope of being able to reproduce that taste and texture that had lulled me into have a lot many Rasgullas that rainy day in Kolkata, but to no avail. None came even a close second.... Then, I read somewhere the milk that is curdled should be curdled slowly, taking care not to shock it, so I decided to try it out.... The rest is history....

The secret of making good, no GREAT tasting rasgullas is in what milk you use and how you go about curdling it and then processing it. Every step is very important, but I wasn't able to get the 2nd one right for quite a while. I finally tasted success (and moth-watering rasgullas) after a long time yesterday. I'll mention how you can go about reproducing them at home....

  1. Buy cow's milk for making rasgullas. 3.5% to 6.0% fat content is alright. Do not get low fat milk

  2. Boil it till about 10% of it boils off, taking care to not let it stick to the vessel. To ensure that, keep stirring as you boil it

  3. For 1 litre of milk, you will need about 2-3 tsp of vinegar or about 3/4th of a lime's juice. IMPORTANT: Dilute this juice in about 1 cup(240ml) of water

  4. Now, slowly keep adding this juice to the boiling milk, making sure to stir all the time. You can add about 1/4th every time, and stir for a minute. Do this about 4 times, and keep stirring and heating the milk on a low-medium flame. The milk should continuously keep curdling

  5. The milk is completely curdled when it separates into while solid and a greenish liquid

  6. Make sure that you do this curdling VERY VERY slowly, otherwise the paneer will result in rubbery rasgullas

  7. Now, separate the solid part(paneer) using a muslin cloth and squeeze out as much water as your possibly can. You can wash the paneer to remove the acidic part and squeeze out again, as much water content as you can

  8. Let this paneer dry up for about 30-45mins

  9. Now, you can follow any standard rasgulla recipe that tells you to knead the dough, form balls and cook them in boiling sugar syrup. However, I would like to clarify a few things

  10. The ratio of sugar:water in the boiling syrup should be 1:6 or lesser. I have heard some people boil the paneer balls in just water

  11. Make sure you roll the paneer balls tight and their surface is smooth. You can smear a bit of ghee(clarified butter) on your hands while rolling to ensure that they don't stick to your palm, and result in smooth balls



Use these rasgullas as rasgullas by dipping them in 1:2(sugar:water) syrup or by squeezing out the water(on cooling) and dipping them in rabri(to make yummy rasmalai!!!!)

Some lessons learnt

I have been working closely with this person called "Ramki", and have learnt a lot of interesting things from him that I wish to share. I don't know if I'll keep appending them to this post or create a new one each time. I guess it depends on my mood and the context and other such stuff....

  1. Never use filenames or variable names in an application that have the name of the product in it. If you ever decide to change the name of the application, you'll have to change all those file and variable names too. This can be a nightmare.

  2. When creating database tables which hold the email address of the user, also use a unique user ID. This allows users to change their email IDs when they please. If you use the email ID as the primary key(like how I was going to), it would not allow a change of email address in a cheap fashion. You would probably have to do an ON UPDATE CASCADE, etc.... and it would be quite an expensive operation.