If I were to work on TDDB from scratch all over again, these are the things I would do differently:
- Use HTTP for the transport layer. We implemented our own protocol for the transport. We got to learn a lot while doing so. However, since there's such a huge eco-system already built around HTTP, it would make sense to use it. Another advantage of using HTTP is that making clients in various languages (and even a client for the browser) is much less of an issue.
- Use sqlite/MySQL/postgreSQL/BerkeleyDB or some other underlying DB for the actual storage. We implemented our own disk format and transaction subsystem instead of using something already available. We learnt a HELLUVA LOT while doing so. However, if I were to do it again, I'd go with something I can just build on. I would need to implement the query parser, the optimizer, the execution engine and the lock manager though since for a distributed DB, these things would define the system.
- In general, use as many existing systems and build on them rather than implementing your own. While the latter approach means that one can learn much more, it also means that some of the "interesting" things won't get as much time and thought as they should.
- Research the common use-cases more thoroughly. I think we just scratched the surface of use-cases when it came to distributed query processing and I think that there is a much wider audience for such a system. Exploring this in more detail is something I hope to do in the future.