Chapter 5: Schema Design

At the core of every database driven app is the schema. Getting it right will form the foundation of a scalable and maintainable application. Getting it wrong will create countless headaches. There is both art and science involved when it comes to designing a schema and many different ways to do it. I’ll detail how I approach it though it’s by all means not the only way.

Fortunately, most engineers are familiar with object oriented programming, and this mindset lends itself very well to schema design. The caveat is that many engineers will take shortcuts with the schema to make their code cleaner only to find that these shortcuts will come back to bite them as the app grows in scale.

Note: On this page, I’m detailing a “normalized” approach to data modeling. Later in the chapter, I’ll address normalization vs. denormalization and compare the two.

Let’s start with the basics

Nouns, Verbs, and Adjectives

When designing an application’s model, I start with identifying the nouns, verbs, and adjectives I want to capture. For example, I might have a person and that person might have one or more addresses. The person has a name and one email address. An address has a street address, a city, a state or region, a zip or postal code, and a country.

So if we break that down to nouns, verbs, and adjectives, it looks like this:

Noun:

person

Adjectives:

first name
last name
email address

We’ve defined that the person may only have one of each of these. This is for simplicity’s sake even though many people have more than 2 given names. Also, technically a business is also a person which would only have one name, but we won’t get into that here.

Noun:

Address

Adjectives:

address 1 (street number and street name)
address 2 (for apartment numbers, etc)
city
region
postal code
country code

These are typical adjectives for an address using terminology that works throughout the world. The US has zip codes, but other countries call them postal codes, etc. Notice we didn’t spell out the whole country. Rather, we’re going to use the standard 3 character country code.

Verbs:

A person has one or more addresses.

In this sentence, “has” is the verb and it forms the relationship between these two nouns. The verb can be anything such as in these examples:

A person has a mother.
A person bought a car.
A person paid rent.

The valid verbs in these examples are “has”, “bought”, and “paid” and form the relationships between nouns. On the Define the Relationship page, we look at how changing a simple verb from “has” to “bought” changes the relationship and therefore how we’ll represent that in a schema. Don’t worry about that for now. We’ll get to that in a couple of pages.

Object Oriented Programming

Most coders have used an OO language and so these concepts aren’t anything truly new. The nouns turn into class objects, the adjectives turn into attributes, and the verbs turn into the relationships between these objects.

Using JSON, and by association a document database, a person and their addresses could look like this:

{
  "first_name": "Brian",
  "last_name": "Broderick",
  "email_address": "brian@example.com",
  "addresses": [
    {
      "address_1": "42 Everything Street",
      "address_2": "POB 21",
      "city": "Las Vegas",
      "region": "NV",
      "postal_code": "12345",
      "country_code": "usa"
    }
  ]
}

In this example, we have a person named Brian Broderick, who has one address on 42 Everything Street.

Normalized Schema in a Relational Database

In a relational database, the nouns become tables, adjectives become columns, and verbs become relationships between tables. Our 2 nouns are address and person, so they’ll be represented by 2 separate tables. ¹

In a zero to many relationship like we have between persons and addresses, the addresses table will have a foreign key to the persons table ². This allows the database to join the two together to associate the person with the right set of addresses.

We’ll get into “has one”, “has many” and “many to many” relationships later. We’ll also get into what happens when our relationship has attributes of its own.

In our example, our tables will look like this in Postgres:

CREATE TABLE persons
(
  id BIGSERIAL NOT NULL 
    CONSTRAINT users_pkey
    PRIMARY KEY,  
  first_name VARCHAR(255),
  last_name VARCHAR(255),
  email_address VARCHAR(255)
);

CREATE TABLE addresses
(
  id BIGSERIAL NOT NULL 
    CONSTRAINT addresses_pkey
      PRIMARY KEY,
  person_id BIGINT NOT NULL
    CONSTRAINT fk_person_addresses
      REFERENCES persons(id),
  address_1 VARCHAR(255),
  address_2 VARCHAR(255),
  city VARCHAR(255),
  region VARCHAR(255),
  postal_code VARCHAR(12),
  country_code CHAR(3)
);

In this case, the addresses table has a reference to the persons table via the foreign key person_id.

Schema in a Graph database

A graph database has the concept of nodes, predicates, edges, and facets instead of tables and columns. A node is like an object so nouns become nodes. Predicates are attributes of a node, so adjectives are mapped to them. Edges are the relationships between nodes, which represent the verbs. Facets are attributes for an edge.

Graph databases have a language standard, but really only Neo4j has adopted it at this point because they pushed the standard in the first place. Other databases such as DGraph make it possible to insert data via JSON or a format called RDF. For the sake of this page, the JSON example above is all we need.

Relationships

For the next few pages, we’re going to talk about relationships such as “has one”, “has many”, and “many to many” between data and how to model different types. We’ll also talk about how attributes on join tables define the relationship the same way facets work in graph databases.

Column Naming Conventions used throughout this book. ↩
The plural for person is either people or persons depending on the context. For this example, I chose to use persons because we may extend that to include businesses in the future. ↩