Chapter 6: Indices

If a query is slow, you should add an index, right? Turns out, this isn’t always the case.

In this chapter, we’ll explore what indices (aka indexes) are, why they’re useful, and how it’s a balancing act with regards to the number of indices a database can have before it starts to have other problems.

What is an index?

An index is a sorted summary of underlying data presented in a format that is easily searchable, sortable, and filterable.

The most common type of index is the B-tree, which we’ll go into detail on the next page. There are several other types of indices and each has strengths, weaknesses, and cases where they are more suited for different uses.

Is there a limit to how many indices a database can have?

Yes, but it’s a subjective limit. When data is persisted to disk via an INSERT, UPDATE, or DELETE, it has to be saved to the data file plus any index that is associated with that data.

For example, in a relational system with a table that has 5 indices, the system will do at least 6 writes when inserting a new row. On a system that is write heavy, having indexes will compound the number of writes which could cause significant extra load.

There is a balancing act between fast reads versus fast writes. On many relational systems, the ultimate amount a system can scale is determined by how many writes it can handle because reads can be spread out across an array of followers.

When to use indices

Indices, in a relational database system, may speed up SELECT queries, but slow down INSERT, UPDATE, and DELETE queries. Specifically, in a SELECT, they may speed up JOIN, WHERE and ORDER BY clauses.

The query planner looks at stats about the index. These determine if it will be faster to scan the index vs. the table when filtering and sorting data.

Different types of indices have different rules to determine when they are effective. These will be explained on the following pages.