Thursday, December 16, 2010

TDDB: What I would do differently if I were to re-implement it

Working on TDDB was a lot of fun (for all of us) and we learnt a lot!! Probably much more than what we initially thought. From not knowing anything about databases to implementing our own meant that we really hit the jackpot in terms on learning stuff and being hand-on. Watching theory and practice collide head on is something we experienced then!!

If I were to work on TDDB from scratch all over again, these are the things I would do differently:
  1. Use HTTP for the transport layer. We implemented our own protocol for the transport. We got to learn a lot while doing so. However, since there's such a huge eco-system already built around HTTP, it would make sense to use it. Another advantage of using HTTP is that making clients in various languages (and even a client for the browser) is much less of an issue.
  2. Use sqlite/MySQL/postgreSQL/BerkeleyDB or some other underlying DB for the actual storage. We implemented our own disk format and transaction subsystem instead of using something already available. We learnt a HELLUVA LOT while doing so. However, if I were to do it again, I'd go with something I can just build on. I would need to implement the query parser, the optimizer, the execution engine and the lock manager though since for a distributed DB, these things would define the system.
  3. In general, use as many existing systems and build on them rather than implementing your own. While the latter approach means that one can learn much more, it also means that some of the "interesting" things won't get as much time and thought as they should.
  4. Research the common use-cases more thoroughly. I think we just scratched the surface of use-cases when it came to distributed query processing and I think that there is a much wider audience for such a system. Exploring this in more detail is something I hope to do in the future.

No comments: