On the previous page, we learned what UUIDs are and some reasons for using them. We also learned about some of their limitations and things to look out for. On this page, we’re assuming the benefits of using UUIDs outweigh the speed and memory advantages of using integers for all keys.
However, we want to avoid excess memory bloat by using a UUID as a primary key. Since secondary indexes often reference the primary key to know the location of the tuple in the data file, using a UUID for this purpose has been shown to cause a lot of memory bloat. As mentioned on the previous page, one of the databases I managed was a data warehouse using Postgres 9.6. The database contained 16TB of data and used UUIDs for all of the primary keys. Since it was a data warehouse, most columns were indexed for reporting purposes. When I changed the database to use the following approach, the memory used was about 30% less than before.
The best approach I’ve found is to continue to use a sequential integer for the primary key, but have an additional indexed column, called uid, which is used in foreign key references.
For example, if we have a users table, and each user has zero or more addresses, we might structure the tables like this in Postgres:
CREATE TABLE users
(
id BIGSERIAL NOT NULL
CONSTRAINT users_pkey
PRIMARY KEY,
uid UUID NOT NULL,
is_active BOOLEAN NOT NULL,
first_name VARCHAR(255),
last_name VARCHAR(255)
);
CREATE UNIQUE INDEX idx_users_on_uid
ON users (uid);
CREATE TABLE addresses
(
id BIGSERIAL NOT NULL
CONSTRAINT addresses_pkey
PRIMARY KEY,
uid UUID NOT NULL,
user_uid UUID NOT NULL
CONSTRAINT fk_addresses_users
REFERENCES users(uid),
address_1 VARCHAR(255),
address_2 VARCHAR(255),
city VARCHAR(255),
region VARCHAR(255),
postal_code VARCHAR(12),
country_code CHAR(3)
);
CREATE UNIQUE INDEX idx_addresses_on_uid
ON addresses (uid);
CREATE INDEX idx_addresses_on_user_uid
ON addresses (user_uid);
Notice REFERENCES users(uid)
in the constraint in the addresses table. Foreign keys can reference any column of the same type. It doesn’t have to be the primary key. In JOIN
clauses, we’d do the same. For example:
SELECT address_1, address_2, city, region, postal_code, country_code
FROM users u
INNER JOIN addresses a ON u.uid = a.user_uid
WHERE u.first_name = 'Brian' AND
u.last_name = 'Broderick';