However, there's an entire class of algorithms and data structures that focus on how efficiently they utilize the host system's cache while processing data. These class or algorithms and data structures are called cache efficient algorithms and data structures [pdf].

There's 2 types of cache efficient algorithms and data structures:

- One class tunes the algorithm for a particular size of cache or cache hierarchy. The algorithm is aware of the cache sizes, the number of caching levels, and the relative speeds of each of these caching levels.
- Another class of algorithms is oblivious of the underlying cache sizes and layouts and are provably optimal for any cache size and caching layout (sounds magical doesn't it!). These are called Cache Oblivious Algorithms. See this link on Cache Oblivious Algorithms for more details, and this link for more details on the model, and the assumptions made in the Cache Oblivious model.

- An example of a cache-efficient algorithm that is also cache-oblivious is Linear Search.
- An example of a cache-inefficient algorithms that isn't cache-oblivious is Binary Search.
- An example of a cache-efficient data structure that isn't cache-oblivious is the B-Tree (since B is the tuning parameter for the particular machine on which we arr running).

Without getting into the details, the complexity of running Binary Search on an array in the Disk Access Model (DAM) (where we are only concerned about the number of disk blocks read, and not the number of comparisons made) is

However, implementing that structure is somewhat complicated, and we should ask ourselves if there is a way to get the best of both worlds. i.e.

Turns out, we can reach a compromise if we use the square-root trick. This is how we'll proceed:

`O(log`, since we must always load a block from disk till we reach a small enough array (of size B) such that no more jumps within that array will trigger another disk I/O operation to fetch another disk block. The optimal complexity for searching for ordered data on disk is realized by ordering the data recursively in a static search tree such that the complexity reduces to^{N}⁄_{B}))`O(log`._{B}N)However, implementing that structure is somewhat complicated, and we should ask ourselves if there is a way to get the best of both worlds. i.e.

- Runtime efficiency of the cache-oblivious recursive layout, and
- Implementation simplicity of the standard Binary Search algorithm on an ordered array

Turns out, we can reach a compromise if we use the square-root trick. This is how we'll proceed:

- Promote every
`sqrt(N)'th`element to a new**summary**array. We use this*summary*array as a first-level lookup structure. - To lookup an element, we perform binary search within this
*summary*array to find the possible extents within the original array where our element of interest could lie. - We then use binary search on that interesting sub-array of our original array to find our element of interest.

**For example:**

- Suppose we are searching for the element '7', we will first perform binary search on the top array; i.e.
*[10,22,33,43]*and identify that 7 must lie in the original*values*sub-array at an index that is before the index of*10*. We then restrict our next binary search to the sub-array*[5,7,8,10]*. - Suppose we are searching for the element '22', we first identify the sub-array
*[11,21,22,25,26,33]*as potentially containing a solution and perform binary search on that sub-array.

Even though we are asymptotically performing the same number of overall element comparisons, our cache locality has gone up, since the number of block transfers we'll perform for a single lookup will now be

`2(log`, which is an additive factor of^{√N}⁄_{B})`log B`lesser than what we had for our normal binary search on the sorted array.