Sunday, November 14, 2010

SQL Query optimization - Part-2

Today we'll talk about whether to use JOINs or nested queries. TDDB (my db engine :D) doesn't support nested queries so I don't talk about them much, but that's just selfish of me (actually even MySQL didn't support them for a long time), but I've grown out of that, so here it goes. Nested queries can actually speed up your query by a HUGE amount.

Consider you have say 6 tables which you are joining and only 2 of them are actually involved in any logical reasoning like generating unique rows. The rest of the tables are vestiges of an over-enthusiastic DB architect or a graduate fresh out of college who thinks that everything should be in BCNF (been there; done that).

Now the other 4 tables are being used just to provide data for the purpose of projection and really have no reason being part of the JOIN. Hence, we refactor the query and create a temporary relation from the 2 tables that actually mean something. Most real queries are interested in the top 10 or 20 results only (ordered by some criteria) so we do whatever we want to on the join of these 2 tables.

This temporary relation is then joined with the other 4 relations and viola! This recipe serves 100 queries for the cost of 1 ;)

This is possible since all the extraneous tables have been avoided in the join that involved ordering, grouping and potentially many other expensive operations.

This can be optimized even more by using a materialized view for the inner query. However, that will increase the insertion time for those tables, and should be used with care. Besides, I don't think all DB engines support materialized views.

No comments: