Thursday, June 24, 2010

Guess what?

            _________
            |*      |
            |       |
    (------------   |-------)
    (               |       )
    (       |--------       )
    (       |               )
     -------|   ------------
            |       |
            |      *|
            ---------

Or this (if you are having a problem with the one above ;))
            _________
            I *     |
            I       |
    (------------   |-------)
    (               |       )
    (       |--------       )
    (       |               )
     -------|   ------------
            |       I
            |       I
            |_____*_I


+5 to the first one who guesses correctly

Murphy's Law (for Python programmers)

Download it here
Please rename the file to murphy.py and from your interpreter:
import murphy

Wednesday, June 23, 2010

YASNOSC

Yet another SQL and NoSQL comparison
  1. Scalability != NoSQL

  2. Performance != NoSQL [not always]

  3. Scalability == Good schema design

  4. Performance == Good schema design

But...
  1. Performance != Scalability

  2. Scalability != Performance

(but we should also note that NoSQL isn't NOt SQL, but Not Only SQL)

Tuesday, June 22, 2010

A new water pump for our washing machine

(This blog post has the potential to be discursive and go in no particular direction, so I'll try my best to remain as unwavering as I can)

It all started off when our washing machine konked off and decided to stop working. My grandma sent it off to the repairers who fixed it and then had the brilliant idea to suggest attaching a water pump so that water from the tank above would flow in much faster. Now, anyone who has played with a syringe would know that no matter how hard you push (or pull), you just can NOT for the love of anything (or anyone) get more than a certain amount of liquid through the syringe (or in this case the pipe).

Now, our pipes are really old and are rusting from the inside. This is the reason that the flow of water has gone down as compared to the time when they were first installed. Even though I suggested getting the pipes changed due to the above mentioned reason (since that is the real problem, my grandma would hear nothing from me... Oh well... so be it...)

The brilliant engineer's suggestion came as a massive shock to me and it almost felt funny. Anyways, the pump has been installed and guess what... The flow of water in the pipe is still the same!! I wonder how that happened. Isn't the pump working (scratch head, etc...)

The rate of flow of water in a pipe depends upon many things. One of them being the inner diameter of the pipe and another being the friction offered by the sides of the pipe. I don't know the exactly formula, but I am guessing it is easy to come by. The pressure exerted by the water in the tank is more than sufficient to max. out the flow rate in the pipe and I would have guessed those engineers to know that. I guess it's back to the drawing board for them ;)

Friday, June 11, 2010

How to write bug-free code

Listen hard and listen fast. Listen once; I won't repeat myself...

While writing code, if you ever say to yourself:
  • Oh! but this will hardly ever happen OR

  • The user will never do this OR

  • What are the odds of this happening??

Please please please go ahead and write some code and comments to handle that situation. You can't ever be too careful.

Signing off,
Captain S.K.Eptic

Tuesday, June 01, 2010

Python generators and the dict.items() function

During the course of yesterday's session, there was an example of iterating over a dictionary(dict) in python. The code looks something like this:

d = {
"name": "chuck norris",
"age": "positive infinity",
}
for k,v in d.items():
print "Key: " + str(k) + ", Value: " + str(v)

The d.items() function returns a list which will take up a non-trivial amount of memory for large dicts that we wish to iterate over. For simplicity's sake if we assume that each pointer takes up 8 bytes and the python tuple type takes up 32 bytes(say) and there are 1 million(106) entities in the dict(d), then we land up with 106 x 56 (Two 8 byte pointers, each to each the key and the value objects, 8 for a pointer to the tuple and 32 for the tuple object itself) which is about 56MB of memory used for just iterating over the dictionary.

If however, we use generators, then we can save all that memory.
However, you can directly print the result of d.items(). Printing a generator object doesn't do anything spectacular, so we will need to create a proxy object to print the result of generation if string coercion is requested. The code for doing so is shown below. Notice that we use the .keys() function within our custom generator. We won't need to use it if we really were the dict object and had handles to the internal variables.

d = {
"name": "chuck norris",
"age": "positive infinity",
}

def dict_items(d):
"""
Returns an object which can be iterated over in a for loop
as well as printed. This is achieved by returning an object
which is both iterable as well as convertible to a string.
However, iteration involves using a generator for fetching
the next value in the sequence
"""
def dict_generator():
dkeys = d.keys()
for k in dkeys:
yield (k, d[k])
dg = dict_generator()

def dgen_to_str():
return str(d.items())

class Dummy(object):
def __getattr__(self, attr):
return getattr(dg, attr)
def __str__(self):
return dgen_to_str()
def __iter__(self):
return dg

proxy = Dummy()
return proxy

diter = dict_items(d)
print diter

for k,v in diter:
print "Key: " + str(k) + ", Value: " + str(v)


However, the major difference comes when you try to do something like this:

for k,v in d.items():
d.clear()
print "Key: " + str(k) + ", Value: " + str(v)

The code above will work as expected and print all the elements of the dict(d)


for k,v in diter:
d.clear()
print "Key: " + str(k) + ", Value: " + str(v)

However, the code above will throw an exception since our generator caches all the keys and dereferences the dict(d) to get the value for the corresponding key.

So, applications that mutate the dict while iterating over it won't quite work as expected.

Python Interest Group(PIG) Act-1; Scene-1

We had the first session of the python interest group(affectively called PIG) at Directi yesterday (1st June, 2010).

Here are the links to the presentation in:

You will need python 2.5 to run the sample programs.
The documentation for python 2.5 is available here

Happy Hacking!!

5 months of salad... and counting...

I started having salad for lunch daily sometime early January, 2010. Since then, it's been fairly smooth sailing and I've added a few ingredients to the mix.

A lot of people have asked me if I got bored of the monotony of salad, but I highly disagree with them all. The deal with salad is that you get so many flavours in every bit that it's really hard to get bored of it. Besides, it's fairly juicy and crunchy that you enjoy having it -- really!!

Either ways, I have added a few extra ingredients over the months, so would like to share those with the very few and faithful readers of this blog.
  • Hazelnuts (halves)

  • Cashews (halves)

  • Pomegranates (in season)

  • Dates

  • Dried Anjeer (figs)

  • I've pretty much knocked off the mixed herbs in the dressing and just stick to Olive Oil and Honey

  • The baby corn is fresh baby corn, which needs to be stripped off it's covering; like you would for normal corn. The baby corn which results is really so sweet that you can eat it as-is!!

Enjoy your salad people!!

Monday, May 31, 2010

Can't come to terms with inheritance

I've been racking my brains over inheritance for a while now, but am still not completely able to get around it.

For example, the other day I was thinking about relating an Infallible Human and a Fallible Human. Let's first define the two:
  • Infallible Human: A human that can never make a mistake. It's do_task() method will never throw an exception

  • Fallible Human: A human that will occasionally make a mistakes. It's do_task() method may occasionally throw a ErrorProcessingRequest Exception

The question was:
IS an infallible human A fallible human OR IS a fallible human AN infallible human?

The very nice answer I received was in the form of a question (I love these since it gives me rules to answer future questions I may have).

"Can you pass an infallible human where a fallible human is expected OR can you pass a fallible human where an infallible human is expected?"

It seems apparent that you can pass an infallible human where a fallible human is expected, but not the other way around. I guess that answered my question.

However, it still feels funny saying "An infallible human is a fallible human". Does anyone else feel queasy when they say it? It almost feels as if speaking out inheritance trees is like reading out statements from propositional calculus in plain English (the if/then implication connectives don't mean the same as that in spoken English). Does anyone else feel the same?

update: This thread on stackoverflow discusses the same issue.

Saturday, May 29, 2010

A few rules to live by

  • you can only lie to others, never to yourself
  • don't hurt others -- it will only come back to you
  • you can justify anything -- we all can -- but don't try too hard. you might end up lying
  • python rocks
  • java bites shite
  • java has resulted in more deaths and cases of insanity than road accidents world-wide
  • the average lifespan of a java programmer is 10 years less than that of a python programmer and 7 years less than the average for a human
  • what can kill chuck norris -- programming in java
  • listen to music you like -- don't pretend
  • eat food you like -- don't pretend
  • wear what you like (if you want to wear anything that is) -- don't pretend
  • criticize -- don't be diplomatic when you don't need to
  • praise -- it spreads good vibes -- but don't praise falsely -- it doesn't help at all
  • music will set you free ;)
  • do what you want to, not what you have to or need to -- sometimes you may "need/have" to do the things though; but don't stretch it too much
  • listen to "sunscreen" if you haven't yet
  • don't recommend it to others because it's "cool", but because you think it's worth doing
  • don't do anything "just" because it's "cool"
  • don't not to anything just because it's not cool
  • the arts are purgative and cathartic; write, act, sing, cook, code... anything is art and art is everything...
  • you may need to twist your finger to get the fat out(transated from a hindi proverb); but do it only if it's absolutely necessary
  • talk is cheap; actions are not
  • neither is code
  • show me the code you piece of shite

Saturday, May 22, 2010

We aren't responsible

I was reading the paper and The Mumbai Mirror's headlines a few days ago went something like this "Criminal Negligence Leaves WR Commuter Critical". The article was with reference to a commuter who had been injured by a plank that fell on him because of negligence on part of the workers working at Churchgate station. This injury unfortunately has left him comatose and he is currently in a critical state in hospital. I wish him a speedy recovery.

Coming from a software centric world where EULAs and licenses accompany any piece of code that is shipped, it seems only natural that no one is really responsible for anything that they do and that any action on their behalf may have negative influences on others. Others need to watch out for anything that may hurt or otherwise injure them. It is the user's and only the user's sole responsibility to ensure his/her fitness and properiety and the user must account for anything that may go wrong during the course of using the product/software mentioned.

I have yet to see any license that says that they take sole responsibility for their product and that they could be held responsible for any damage arising due to negligence on their behalf. The closest I have come to it is Knuth's exponentially increasing reward for finding bugs in TeX. As far as free software is concerned, we shouldn't expect it. However in case we are paying for the software, shouldn't we demand it? I mean we are literally shelling out hard cash for that piece of code. We seem to have a right to want it to be bug-free...

Either ways, it seems no one wants to take responsibility for their code, so why not make it a de-facto standard. Instead, if someone out there is willing to say that they will take responsibility for screw ups (of any) then they should include it in the license agreement.

It starts off with software saying that it may not work well and ends up at coffee cups and hot dogs saying that their contents may be too hot for consumption.

What next? From what I see, there may now be notices on every pavement saying that people should walk on it on their own risk. Each lift will say that it could crash any minute and if it does; even due to any manufacturing defect or installation glitch, then company providing it should not be held responsible. Airplane and trains will warn travelers before they enter saying "Please account for the fact that you may not make it if you step in". Books may say "Read at your own peril. Your eyes may gouge out while reading this book, but don't blame us!!". When will all this end???

Saturday, May 15, 2010

Events: How to use them to your advantage

A very interesting conversation between a colleague and me resulted in some equally interesting learnings as far as I was concerned.

The topic of discussion was about whether certain functionality in a software should be implemented via events or via function calls directly to the concerned component. More concretely, should the coupling be via events as in a pubsub based system or should it be via known function end-points on specific objects.

If the event based approach is used, there is lose coupling, and the coupling is on events. Using the other approach, there is strong coupling and one component needs to know about the interfaces of the other component.

Let's take a toy example to analyze this situation.

Consider an interior stylist that is developing an integrated home theatre system. Say there are multiple components involved:
1. Speaker system
2. Television
3. CD player
4. Satellite system

For all practical purposes, users should not need to turn on each of the separate components separately (as they do today). They need not even know what the individual constituents of the system are. For that purpose, having a separate button that turns on the Television, one that turns on the speakers and one that controls the CD player seem overkill. Just the press of a single "start CD" button should do the needful.

Let's discuss various ways in which we can achieve the desired effect:
  1. Strongly couple all the components together so that starting the CD player also triggers the TV and the speaker system.

    What happens if the satellite system is turned on? It must also turn on the TV and speaker system. Tomorrow if there is a new device "USB photo viewer" that needs to use the display that the Television provides, it will also need to do the needful.

    This method seems to waste a lot of developer time and seems to be prone to errors. What if someone forgets to turn on the TV? Also each component needs to handle the case of when the TV/speaker is already on, etc...

    The CD player will work only with that TV set and speaker system that speaks a language that it understands (tied to an interface). Vendors will try to tie users to their interface in the hope of selling them all the components.

  2. Instead, if we just decouple all the system and couple them on events, things become much clearer and easier to work with.

    The "button.enterteinment.cdplayer", "button.enterteinment.satellite" and "button.enterteinment.usbdevice" events should be listened to by each of the components and they should react accordingly.

    Another thing to remember is how to name the events. We should NOT name events as "start.enterteinment.cdplayer". That sounds very imperative. Naming plays a very important role in how the rest of our system is built around these events. Wrongly named events can cause a lot of confusion and screw up the system even more than what you thought capable of doing!!! Event names should be suggestive of "something happening" and not of "something being asked for".

    Accordingly, events should not be named "do-this" and "do-that". Instead, prefer names like "this-happened" and "that-happened" so that listeners can react to these events in the way they deem most fit.

    By naming events imperatively, we immediately constrict their use cases to what we think they should be; defeating the original purposes of using events -- which is to free event raisers from thinking of what is to be done in reaction to a raised event.

Sunday, May 09, 2010

One line to teach you Object Oriented Programming (OOP)

"When modeling objects, model behaviour, not attributes"
Courtesy: Ramki

Probably the hottest summer I've seen in Mumbai

This is by far the hottest and stickiest summer I have seen in Mumbai since my stint on this planet.

Sunday, May 02, 2010

Protocol Buffers v/s HTTP

I was discussing serialization costs with my immediate superior, Ramki when one thing lead to another and I landed up wanting to compare the serialization and deserialization costs of protocol buffers (a binary format) to those of HTTP (a text based protocol) in python.

After performing some tests (in python), I observed that protobuf was taking almost 4 times the amount of time to deserialize data as compared to the simplistic HTTP based header parsing. Of course, these 2 are meant for different purposes and Ramki mentioned that a fixed format protocol would save data on the wire since the attribute names (header names in HTTP) need not be sent on the wire; just the values (header values in HTTP) are sufficient.

Also a binary protocol should be much faster as far as serialization and deserialization is concerned, but we found out that python's pack and unpack are a bit slow OR it is blazingly fast at doing string operations.

Here is a representative output from one such run of the program below:

ramki serialization time: 0.125000 seconds
ramki deserialization time: 0.156000 seconds
protobuf serialization time: 0.453000 seconds
protobuf deserialization time: 0.453000 seconds
http serialization time: 0.047000 seconds
http deserialization time: 0.125000 seconds

The code:

import sys
import message_pb2
import time
import struct

def multi_serialize(o, n):
fragments = []
for i in xrange(0, n):
data = o.SerializeToString()
fragments.append("%d\r\n" % len(data))
fragments.append(data)
return "".join(fragments)

def multi_parse(input, n):
il = len(input)
start = 0
objects = []
for i in xrange(0, n):
rnPos = input.find("\r\n", start)
if rnPos == -1:
print "Premature end of input. Terminating..."
return None
lenStr = input[start:rnPos]
start = rnPos + 2
lenInt = int(lenStr)
# Read off lenInt bytes off the stream
data = input[start:start + lenInt]
start += lenInt
obj = message_pb2.Header()
obj.ParseFromString(data)
objects.append(obj)
return objects


def http_header_create(request, headers):
line1 = "GET %s HTTP/1.1" % request
hLines = [line1]
for k,v in headers.items():
hLines.append(k + ": " + v)
return "\r\n".join(hLines) + "\r\n\r\n"

def http_header_parse(input):
parts = input.split("\r\n")
line1 = tuple(parts[0].split())
headers = { }
for i in xrange(1, len(parts)):
h = parts[i].split(": ")
if len(h) == 2:
k,v = h
headers[k] = v
return (line1, headers)

def http_multi_serialize(request, headers, n):
fragments = []
for i in xrange(0, n):
fragments.append(http_header_create(request, headers))
return "".join(fragments)

def http_multi_parse(input, n):
il = len(input)
start = 0
objects = []
for i in xrange(0, n):
delimPos = input.find("\r\n\r\n", start)
if delimPos == -1:
print "Premature end of input. Terminating..."
return None
headerString = input[start:delimPos]
headerObject = http_header_parse(headerString)
objects.append(headerObject)
start = delimPos + 4
return objects

def ramki_serialize(obj):
totalLength = 0
attrs = [ ]
for k,v in obj.__dict__.items():
totalLength += (2 + len(v))
attr = struct.pack("H", len(v)) + v
attrs.append(attr)
attrs.insert(0, struct.pack("H", totalLength))
return "".join(attrs)

class RamkiDummy(object):
pass

shortStruct = struct.Struct("H")

def ramki_deserialize(input):
# For now, we lose attribute names
d = RamkiDummy()
packetLength = shortStruct.unpack(input[0:2])[0]
s = 2
ctr = 0
while s < packetLength+2:
# print "CTR: " + str(ctr)
attrLength = shortStruct.unpack(input[s:s+2])[0]
s += 2
# Read attrLength bytes of data
attrValue = input[s:s+attrLength]
s += attrLength
setattr(d, "attr" + str(ctr), attrValue)
ctr += 1

return d

def ramki_multi_serialize(obj, n):
stream = []
for i in xrange(0, n):
stream.append(ramki_serialize(obj))
return "".join(stream)

def ramki_multi_deserialize(input, n):
objects = []
s = 0
for i in xrange(0, n):
objectLength = shortStruct.unpack(input[s:s+2])[0] + 2
obj = ramki_deserialize(input[s:s+objectLength])
s += objectLength
objects.append(obj)
return objects

def main():

class Dummy(object):
pass

d = Dummy()
d.request = "GET"
d.resource = "/user/ramki/getVcard/"
d.version = "1.1"
d.destination = "localhost:8080"
d.custom1 = "434552"
d.custom2 = "no"

s = time.time()
input = ramki_multi_serialize(d, 10000)
print "ramki serialization time: %f seconds" % (time.time() - s)

s = time.time()
ramki_multi_deserialize(input, 10000)
print "ramki deserialization time: %f seconds" % (time.time() - s)

h = message_pb2.Header()
h.request = "GET"
h.resource = "/user/ramki/getVcard/"
h.version = "1.1"
h.destination = "localhost:8080"
h.custom1 = "434552"
h.custom2 = "no"

s = time.time()
stream = multi_serialize(h, 10000)
print "protobuf serialization time: %f seconds" % (time.time() - s)

s = time.time()
objs = multi_parse(stream, 10000)
print "protobuf deserialization time: %f seconds" % (time.time() - s)

hh = { "Host": "localhost",
"X-MessageID": "33",
"X-ACKMessage": "100",
}

s = time.time()
stream = http_multi_serialize("/user/ramki/getVcard/", hh, 10000)
print "http serialization time: %f seconds" % (time.time() - s)

s = time.time()
objs = http_multi_parse(stream, 10000)
print "http deserialization time: %f seconds" % (time.time() - s)

return 0

sys.exit(main())


This page claims that protobuff deserialization time is 3478ns whereas I am seeing a time of 4530ns which is expected since we are running on different hardware.

From: here, it seems as it protobuf is good at packing ints but not strings.

In fact, none of the deserialization times even come close to the 1250ns that I see for http parsing. This is mainly because those methods do type conversion which HTTP is not doing.
If that is introduced into the mix, I guess those costs will add up too. However, the application that I want it for doesn't really need it, and there will be many applications that don't.

In the link above, many of the methods, serialization takes more time than deserialization which is slightly curious.

Thursday, April 22, 2010

Inheritance and extension

I have been stumped by this issue for a while, so I decided to write about it.

Say you have an element container called SimpleList which exposes the following methods(apart from others of course):
1. add(element): Adds 'element' to the SimpleList
2. addAll(elements): Adds 'elements' to the SimpleList

These methods are extensible(can be overriden and hence extended).

They are implemented as follows in SimpleList.

# Contract: Will add 'element' to SimpleList or throw an
# exception in case of an error.
method add(element)
# Adds element to the underlying data store/array

# Contract: Will add 'elements' to SimpleList or throw an
# exception in case of an error. This method may throw an
# exception after adding 'some' of the elements from 'elements'.
method addAll(elements)
for element in elements
this.add(element)


Fair enough till now.

Up comes someone who decides they can do better and they extend SimpleList and create ABetterSimpleList!!

They are implemented as follows in ABetterSimpleList.

# Contract: Will add 'element' to ABetterSimpleList or throw
# an exception in case of an error.
method add(element)
# Adds element to an underlying cache since SimpleList
# may take a while to add() this element.
if cache.size() > 20:
this.addAdd(cache)
cache.clear()

# Contract: Will add 'elements' to ABetterSimpleList or throw
# an exception in case of an error. This method may throw an
# exception after adding 'some' of the elements from 'elements'.
method addAll(elements)
# Just call the base class' addAll method


All contracts have been satisfied, but can you spot the subtle bug?

Yes, there is an infinite recursion here!! Calling add() on ABetterSimpleList will add elements to the cache till the cache grows to 20 elements in length. Once that happens, it will call addAll() which will call the base class' addAll() which will call (what it thinks is) it's add() function, except that it had been overridden by us to call addAll()!!

Well, how do you solve this?? I don't have a concrete answer myself, but just scratching the surface has lead me to believe that there are 2 type of class extensions:
1. Extending for the base class (I call this framework extension)
2. Extending for users/subclasses (I call this normal extension)

In case [1], it is expected that the base class will consume your interfaces so you program for the base class. However, in case [2], your consumers are externals and not internals like your base class. Hence, you should now program for them.

An important side effect of this is that classes that are meant to be consumed by subclasses (case [2]) should NOT use their own overridable methods in their own implementation or else they CAN NOT give any guarantees to their consumers (subclasses in this case). However, framework extended classes (case [1]) SHOULD use their own public interfaces since that IS the purpose of them allowing method overriding on those interfaces. They are expected to be the consumers of their subclass' funtionality.

Examples of each type of class:
1. Framework Extension: Java's Runnable, C++'s streambuf
2. Normal Extension: Java's LinkedList, AbstractList. Typically anything that is layered falls into this category

Patterns for Concurrent, Parallel, and Distributed Systems

I happened to chance up on this great site which describes Patterns for Concurrent, Parallel, and Distributed Systems.

I landed up here searching for the proactor and reactor patterns, both of which are described very well.

http://www.cs.wustl.edu/~schmidt/patterns-ace.html

Tuesday, February 02, 2010

An experiment with carbs

About a month ago, Appu, Abbas and myself met and Appu happened to mention that an increased intake of carbs. generally lent itself to hunger pangs if one did not eat for a while.
For the last month, I've been having only salad for lunch, and daal, vegetables and chapati(3 nos.) for dinner and have also been limiting my carb. intake by way of reducing the amount of sugar I add to my hot beverages.
However, yesterday I had 4 liquor chocolates (which Devdas had so generously brought back from New Zealand), 5 milk pedhas (courtesy Jaineev) and 2 Kaju Katris (courtesy Mukesh), which considerably increased my carb intake for that day. Furthermore, I had 5 chapatis and extra helpings of other stuff yesterday for dinner. I don't know if it had any direct link to the carb. intake, but I will be monitoring this more closely now.

Moments

I visited Sandeep in Bangalore when I had gone for the Agile 2010 conference in Jan 2010. Good times followed ;)

Monday, February 01, 2010

Vanilla Milk using natural vanilla extract

I started making vanilla extract in July 2009 using adaptations of the innumerable recipes available online. It's now been 7 months since I started so I decided to try out the extract for making vanilla milk (which btw I absolutely love and up to this point made using pulverized vanilla beans).
  • Attempt-1: Measure about 1 cup (approx. 200 ml) of milk in a glass, pour it in a vessel and heat. Add 1/2 tsp vanilla extract to the glass (which has traces of cool milk). The last bit caused the trace amounts of milk in the glass to curdle. However, pouring the heated milk and sugar in the glass did not curdle the rest of it. The whole drink smelt of alcohol though.

  • Attempt-2: This time, I decided to not leave trace amounts of milk before adding the extract to the glass. Again however, I did get some smell of alcohol.

  • Attempt-3: This time, I decided to do some research online and figure out if anyone else is facing the problem of an alcoholic smell in their extract. As it turns out, it is expected to be this way for a real extract!! (feature, not a bug ;) ). I also happened to read that adding sugar reduces the alcoholic smell, and that when this extract is used in cooking, most of the alcohol will in fact evaporate and hence not leave behind that alcoholic smell. So, this time, I added some sugar to the glass before adding the vanilla extract, and after pouring in the hot milk, I stirred for about 4-5 minutes before consuming the drink. Viola!! No smell now!!


Another nice link explaining a lot of things.