“Many to Many” Relationships

On the past 2 pages, we talked about when to use “has_many” and “has_one” relationships. We also went over how “has_one” relationships aren’t used nearly as often as they should though they can significantly improve database performance.

The last main category is the “many_to_many”. Because of the extra complexity in a relational database, they aren’t used quite enough; however, they’re the defacto standard for any relationship in a graph database.

What is a “Many to Many” Relationship

In our running example of a CMS, we used a “has_many” when we said that 1 template can have many web pages assigned to it though a web page may only have 1 template. For “has_one”, we defined a website as having 1, and only 1, home page.

A “many_to_many” relationship is when both sides can join into multiple things. This isn’t possible with just 2 tables in a relational system; there must be a 3rd table introduced to the schema, which is referred to as a mapping or join table.

All relationships in a graph database are called “edges”, where the entities are called “nodes”. Edges are essentially pointers between 2 nodes that can also have properties of their own called “facets”. We’ll get into this more in a bit when we talk about “Defining the Relationship”.

CMS Example Schema so far:

In the last 2 sections, we’ve started building the schema for a content management system. These are the tables we’ve talked about so far that matter for this discussion.

CREATE TABLE templates
(
  id SERIAL NOT NULL 
    CONSTRAINT templates_pkey
    PRIMARY KEY,  
  header TEXT,
  footer TEXT
);

CREATE TABLE sites
(
  id SERIAL NOT NULL 
    CONSTRAINT sites_pkey
      PRIMARY KEY,      
  url VARCHAR(255)
);

CREATE TABLE web_pages
(
  id SERIAL NOT NULL 
    CONSTRAINT web_pages_pkey
      PRIMARY KEY,
  template_id INT 
    CONSTRAINT fk_template_web_pages
      REFERENCES templates(id),
  site_id INT
    CONSTRAINT fk_site_web_pages
      REFERENCES templates(id),
  slug VARCHAR(255) NOT NULL,
  content TEXT
);

In the “has_one” section, we also started talking about a worker process that reads each page’s content and figures out which hashtags the page should belong to. Since a web page can have many hashtags, and a hashtag can have many web pages, a “many_to_many” relationship is appropriate.

Join tables

To create a “many_to_many” relationship, there needs to be a table in the middle, which has references to both of its parent tables. This table is known as a join or map table.

We already have a web_pages table, so let’s create the hashtags table.

CREATE TABLE hashtags
(
  id SERIAL NOT NULL 
    CONSTRAINT hashtags_pkey
      PRIMARY KEY,
  site_id INT
    CONSTRAINT fk_site_web_pages
      REFERENCES sites(id),
  hashtag VARCHAR(255) NOT NULL
);

Next, let’s create the join table.

CREATE TABLE hashtags_web_pages
(
  id SERIAL NOT NULL 
    CONSTRAINT hashtags_pkey
      PRIMARY KEY,
  hashtag_id INT
    CONSTRAINT fk_web_page_hashtags
      REFERENCES hashtags(id),
  web_page_id INT
    CONSTRAINT fk_hashtag_web_pages
      REFERENCES webpages(id)
);

In some ORMs, like ActiveRecord in the Ruby on Rails ecosystem, the convention is to not have a primary key and only have references to the 2 foreign keys. They’ve written code to handle “many_to_many” relationships, while hiding the complexity of managing the join table. To manage relationships, they either delete or insert records into this table. This has always felt clunky to me. In ActiveRecord, there’s no direct way to manipulate this table. Instead, it’s done with relationship methods on the parent tables. Because of this, you have to select the parent table first, even if you already know what the IDs are. This can create a lot of unnecessary SELECT statements to execute.

It also assumes that you will never have attributes on the relation itself. Later, after realizing that attributes are needed on the join table, everything has to be refactored.

I don’t recommend doing it that way. It’s fine when the database is small, but it becomes both a performance and maintenance issue as things get bigger and more complicated. Instead, treat the join table like any other ORM model. Using the ORM to magically join tables often proves ineffective, especially since it may choose to do multiple queries instead of joins.

A rule of thumb when using any ORM is to be explicit about your SQL. ORMs use dark magic. It’s an easy path at first, but it may bite you when you least expect it. To be clear, I like ORMs. I use them every day; however, understanding what it’s doing and how to bend it to your will is key.

On the next page, we’ll talk about how to model more complicated relationships in both relational and graph databases.