Tuesday, January 18, 2011

SQL Query optimization - Part-8 (Indexes Schmindexes)

The "Why are my queries slow despite indexes on all columns" Edition.

Consider the following table:
CREATE TABLE florist_orders(
 order_id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, 
 basket_color INT (1) NOT NULL, -- This can be either 0, 1, 2 or 3
 num_roses INT NOT NULL, 
 num_sunflowers INT NOT NULL, 
 num_daisies INT NOT NULL, 
 processed INT(1) NOT NULL DEFAULT 0, 
 created TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Let's assume that the packaging for every basket of a different color is done by a diffrent person. Additionally, baskets packaging also differs based on the maximum number of flowers of each type. If there is more than 1 of any type, the packaging differs from when there is at most 1 of each type of flower. Each packager wants to check the orders that she needs to process.

The following query works fine:
SELECT * FROM florist_orders
WHERE
 basked_color = 2 AND
 processed = 0 AND
 (num_roses > 1 OR num_sunflowers > 1 OR num_daisies > 1)
ORDER BY created ASC -- Process older orders first to avoid starvation
;

However, as the size of the florist_orders tables gorws, you realize that each query is taking too long. Why is this? You have an index on every column and yet queries are slow?

You decide to create composite indexes. As a first attempt, you create an index on
(basket_color, processed, num_roses, num_sunflowers, num_daisies)
but notice that it hasn't really affected the query's execution time. Additionally, EXPLAIN says that no indexes are being used!! What to do now?

To solve this issue, you would need to modify your table and make it friendly for this specific query. These are the steps you would need to take:
  1. Add a column named num_flowers which is set to MAX(num_roses, num_sunflowers, num_daisies) on every INSERT and UPDATE via TRIGGERS.
  2. Then, create an index on:
    (basket_color, processed, num_flowers)
  3. Modify your query to read:
    SELECT * FROM florist_orders
    WHERE
     basked_color = 2 AND
     processed = 0 AND
     num_flowers > 1
    ORDER BY created ASC -- Process older orders first to avoid starvation
    ;
This should pretty much fix any speed issues with the aforementioned query and ensure that all queries remain responsive. This trick will work ONLY if all the flowers are being checked to be greater than the SAME number.