Column Order Matters

Databases are essentially fancy file scanners with optimizations to minimize how much of the file needs to be looked at to get the requested data. As we learned previously, data and index files are organized in fixed size blocks. In Postgres, these blocks are 8KB and are referred to as pages. In MySQL, they are 16KB.

What happens if a tuple contains more than a page worth of data? The tuple ends up spanning two or more pages.

To put this in context, let’s assume we have a table called users that contains a lot of character fields. For our example, the data per row often exceeds 8KB, and the table consists of 100k rows. The data file would contain up to 200k 8KB page blocks.

If we ran the following query, SELECT * FROM users;, the database would have to scan all 200k pages. Instead, if we ran this query, SELECT id FROM users;, the id field would only be in one page per tuple. Once it finds the field, it doesn’t have to look at the second page. The scan could be completed by scanning 100k pages or one page per tuple, which would speed up the query.

Column Order Strategy

To minimize the number of pages scanned, there are three things to consider:

Use the Smallest Type that’s Reasonable

Regardless of column order, you should use the smallest type that you’ll reasonably need. For example, integers in Postgres can be a smallint, int, or bigint, which are 2, 4, or 8 bytes respectively. If you’re never going to exceed the max value of a smaller type, use that. This will reduce the overall data stored, which reduces memory usage and speeds up table scans.

Put smaller types first

Fixed sized columns such as booleans, dates, integers, UUIDs, and fixed width character fields should go before variable width columns such as varchar, text, blob, and JSON fields. Depending on the character set, such as UTF-8, each character will use between 1-4 bytes. Therefore, one single character may use the same space as an entire integer.

By packing in the small fields first, more overall fields can fit in 8KB of storage. This increases the probability that a query will find all of the fields it needs in the first page.

Put frequently accessed columns before infrequently accessed ones

Put columns that are more likely to be selected before more obscure columns. The database stops reading a page when it has found all the data it needs in it; therefore, putting less accessed columns at the end will reduce the amount of data required to be read.

Example

In my tables, it’s quite common to have inserted_at and updated_at columns because the system I use creates these as standard. For consistency purposes, I generally put these two right after the primary key, which is always labeled id. If there’s another date that is nearly always sorted on, it will come next such as published_at. After that, I put any foreign key columns because these are commonly used to join tables. Next come booleans, numbers, and dates. After that are fixed width character columns, and finally I add varchars, text fields, and other variable width fields. These are ordered by the probability that they will be needed in a query.

CREATE TABLE IF NOT EXISTS users
(
  id SERIAL NOT NULL           -- serial is an auto increment in Postgres
    CONSTRAINT users_pkey
        PRIMARY KEY,
  inserted_at TIMESTAMP,
  updated_at TIMESTAMP,
  account_id INTEGER NOT NULL, -- reference to the accounts table
  is_active BOOLEAN DEFAULT false NOT NULL,
  uid UUID NOT NULL,           -- universal id.
  password_digest CHAR(40) DEFAULT '' NOT NULL, -- 40 character SHA
  email VARCHAR DEFAULT '' NOT NULL,
  name VARCHAR DEFAULT '' NOT NULL,
  phone VARCHAR DEFAULT '' NOT NULL,
  avatar_url TEXT,             -- URLs can be more than 255 characters
  bio TEXT
);

Reordering Columns

As columns are added and removed over time, it’s easy to get the column order out of whack. This is especially true when using ORMs with migration capabilities built in because each migration is treated as a separate task, rather than part of the whole.

With MySQL, it’s easy to keep things organized because it comes with the AFTER keyword in the ALTER TABLE command.1 For example, ALTER TABLE users MODIFY email VARCHAR AFTER password_digest;

Postgres does not have the AFTER keyword. Reordering columns can be done, but it’s a bit more complicated. The steps are:

  1. Create a new table with the columns in the right order.

  2. INSERT from a SELECT into the new table. Ex: INSERT INTO new_table_name SELECT columns FROM old_table_name;

  3. Rename original table. I add the suffix _old.

  4. Rename the new table to the original table’s name.

  5. Create any indices

  6. Verify everything is good.

  7. Drop original table

You can also use CREATE TABLE AS, but it won’t create the primary key or the auto incrementing sequence so you’ll need to ALTER TABLE and add these.

Original Table:

CREATE TABLE IF NOT EXISTS accounts
(
  id SERIAL NOT NULL
    CONSTRAINT accounts_pkey
      PRIMARY KEY,
  inserted_at TIMESTAMP,
  updated_at TIMESTAMP,
  website TEXT,
  is_active BOOLEAN DEFAULT false NOT NULL,
  name VARCHAR
);

Create new table with columns in the right order:

CREATE TABLE IF NOT EXISTS accounts_new
(
  id SERIAL NOT NULL
    CONSTRAINT accounts_new_pkey
      PRIMARY KEY,
  inserted_at TIMESTAMP,
  updated_at TIMESTAMP,
  is_active BOOLEAN DEFAULT false NOT NULL,
  name VARCHAR,
  website TEXT
);

INSERT INTO accounts_new SELECT id, inserted_at, updated_at, is_active, name, website FROM accounts; 

ALTER TABLE accounts RENAME TO accounts_old;
ALTER TABLE accounts_new RENAME TO accounts; 

Verify everything is good and do a backup.

DROP TABLE accounts_old;

  1. MySQL Alter Table