Relational databases consist of schemas with tables containing predefined columns. In concept, they are like a bunch of spreadsheets that can be joined together. They are the most popular type of database today1 with examples including PostgreSQL (aka Postgres), MySQL, MariaDB, MS SQL Server, and Oracle.
It’s a little confusing, but the word “relational” refers to Relational Algebra in an RDMS, which was invented by Edgar Codd in 19702. It doesn’t refer to how data can have relationships with other data.
In Relational Algebra, a relation is essentially a database table. Formulas are defined to get data in and out of relations as well as how to join them together. In other words, Relational Algebra is how the DB responds to SQL.
The following formula illustrates how tables are joined together. We won’t spend time trying to explain the math here; however, the Relational Algebra source listed below goes into more detail.
In 1969, ARPANET started researching networking and development of UNIX had begun. The first Request for Comment or RFC was published and the serial interface standard was defined.3
In 1970, when Relational Math was invented, the first dynamic RAM chip was released by Intel. It had a capacity of 1 Kbit or 1024 bits. Also, development for the first microprocessor had begun.
Why this matters
The math involved in joining tables is computationally expensive; however, the amount of data being processed was far less than what we work with today. Around the time this was invented memory usage was measured in kilobits rather than gigabytes.
Modern database models such as Graph often use pointers4 to join data together rather than relational math, making it more performant.
How data is saved to disk is important because it can speed up or slow down certain types of queries.
The most popular relational systems like MySQL or Postgres store data as tuples, which is synonymous with rows. Queries designed to return specific rows are very fast, which is most common with transactional type applications.
When a tuple is updated in Postgres, the original is marked as dirty. The entire tuple is written again including whatever was updated. This way, the database can return what the data looked like at the beginning of a transaction even though there may be writes happening concurrently. Another process, called a vacuum, runs when enough tuples are marked as dirty. This marks dirty tuples as overwritable so that the space on the hard drive can be eventually reclaimed.
By contrast, columnar database engines store columns together on disk rather than rows. Queries that return aggregate functions on specific columns is faster, but selecting entire rows is slower than their row based counterparts.
Data warehouses and other analytical systems can often benefit from this. Popular columnar data stores include Vertica, Redshift, and MariaDB’s ColumnStore data engine.
It’s very likely that your application is powered by a relational database. We’re going to open the black box and let you peer into its internals. Soon, you’ll have the knowledge to troubleshoot and diagnose performance challenges.
This book emphasises open source software. Examples will be mostly from PostgreSQL or MariaDB, but the concepts can be applied to any relational system.