Tuesday, June 01, 2010

Python generators and the dict.items() function

During the course of yesterday's session, there was an example of iterating over a dictionary(dict) in python. The code looks something like this:

d = {
"name": "chuck norris",
"age": "positive infinity",
}
for k,v in d.items():
print "Key: " + str(k) + ", Value: " + str(v)

The d.items() function returns a list which will take up a non-trivial amount of memory for large dicts that we wish to iterate over. For simplicity's sake if we assume that each pointer takes up 8 bytes and the python tuple type takes up 32 bytes(say) and there are 1 million(106) entities in the dict(d), then we land up with 106 x 56 (Two 8 byte pointers, each to each the key and the value objects, 8 for a pointer to the tuple and 32 for the tuple object itself) which is about 56MB of memory used for just iterating over the dictionary.

If however, we use generators, then we can save all that memory.
However, you can directly print the result of d.items(). Printing a generator object doesn't do anything spectacular, so we will need to create a proxy object to print the result of generation if string coercion is requested. The code for doing so is shown below. Notice that we use the .keys() function within our custom generator. We won't need to use it if we really were the dict object and had handles to the internal variables.

d = {
"name": "chuck norris",
"age": "positive infinity",
}

def dict_items(d):
"""
Returns an object which can be iterated over in a for loop
as well as printed. This is achieved by returning an object
which is both iterable as well as convertible to a string.
However, iteration involves using a generator for fetching
the next value in the sequence
"""
def dict_generator():
dkeys = d.keys()
for k in dkeys:
yield (k, d[k])
dg = dict_generator()

def dgen_to_str():
return str(d.items())

class Dummy(object):
def __getattr__(self, attr):
return getattr(dg, attr)
def __str__(self):
return dgen_to_str()
def __iter__(self):
return dg

proxy = Dummy()
return proxy

diter = dict_items(d)
print diter

for k,v in diter:
print "Key: " + str(k) + ", Value: " + str(v)


However, the major difference comes when you try to do something like this:

for k,v in d.items():
d.clear()
print "Key: " + str(k) + ", Value: " + str(v)

The code above will work as expected and print all the elements of the dict(d)


for k,v in diter:
d.clear()
print "Key: " + str(k) + ", Value: " + str(v)

However, the code above will throw an exception since our generator caches all the keys and dereferences the dict(d) to get the value for the corresponding key.

So, applications that mutate the dict while iterating over it won't quite work as expected.

Python Interest Group(PIG) Act-1; Scene-1

We had the first session of the python interest group(affectively called PIG) at Directi yesterday (1st June, 2010).

Here are the links to the presentation in:

You will need python 2.5 to run the sample programs.
The documentation for python 2.5 is available here

Happy Hacking!!

5 months of salad... and counting...

I started having salad for lunch daily sometime early January, 2010. Since then, it's been fairly smooth sailing and I've added a few ingredients to the mix.

A lot of people have asked me if I got bored of the monotony of salad, but I highly disagree with them all. The deal with salad is that you get so many flavours in every bit that it's really hard to get bored of it. Besides, it's fairly juicy and crunchy that you enjoy having it -- really!!

Either ways, I have added a few extra ingredients over the months, so would like to share those with the very few and faithful readers of this blog.
  • Hazelnuts (halves)

  • Cashews (halves)

  • Pomegranates (in season)

  • Dates

  • Dried Anjeer (figs)

  • I've pretty much knocked off the mixed herbs in the dressing and just stick to Olive Oil and Honey

  • The baby corn is fresh baby corn, which needs to be stripped off it's covering; like you would for normal corn. The baby corn which results is really so sweet that you can eat it as-is!!

Enjoy your salad people!!

Monday, May 31, 2010

Can't come to terms with inheritance

I've been racking my brains over inheritance for a while now, but am still not completely able to get around it.

For example, the other day I was thinking about relating an Infallible Human and a Fallible Human. Let's first define the two:
  • Infallible Human: A human that can never make a mistake. It's do_task() method will never throw an exception

  • Fallible Human: A human that will occasionally make a mistakes. It's do_task() method may occasionally throw a ErrorProcessingRequest Exception

The question was:
IS an infallible human A fallible human OR IS a fallible human AN infallible human?

The very nice answer I received was in the form of a question (I love these since it gives me rules to answer future questions I may have).

"Can you pass an infallible human where a fallible human is expected OR can you pass a fallible human where an infallible human is expected?"

It seems apparent that you can pass an infallible human where a fallible human is expected, but not the other way around. I guess that answered my question.

However, it still feels funny saying "An infallible human is a fallible human". Does anyone else feel queasy when they say it? It almost feels as if speaking out inheritance trees is like reading out statements from propositional calculus in plain English (the if/then implication connectives don't mean the same as that in spoken English). Does anyone else feel the same?

update: This thread on stackoverflow discusses the same issue.