Postgres stores its data as a set of files inside its data directory. While some databases make it possible to have many different data directories, PG may only have one.
The postgresql.conf
determines the location of the data file by setting the data_directory
parameter. To see the value, run the query SHOW data_directory;
to provide the value for this parameter and will return something like /var/lib/postgresql/12/main
.
Inside the directory is another directory called base
, which contains all the data for this cluster. In base
, there are a bunch of directories that are just numbers that correspond to different databases. PG refers to these numbers as an OID
and a database name as a datname
.
For example, you might see this list of directories when running ls
:
1 13266 13267
There are two ways to get the mapping between these OIDs and their names.
Run the query SELECT oid AS object_id, datname AS database_name FROM pg_database;
Run the command line tool oid2name
You’ll get something like the following:
All databases:
Oid Database Name Tablespace
-----------------------------------
13267 postgres pg_default
13266 template0 pg_default
1 template1 pg_default
Now let’s create a database by running the query CREATE DATABASE millis;
. After rerunning ls
, I now see 1 13266 13267 24625
. Of course, your specific OIDs will be different. I ran oid2name
again and now see:
All databases:
Oid Database Name Tablespace
-----------------------------------
24625 millis pg_default
13267 postgres pg_default
13266 template0 pg_default
1 template1 pg_default
Next, enter the millis directory. For me, the command looks like cd 24625
.
There are many files here. You can see how many by running ls -la | wc -l
, which returned 295 when I ran it. We haven’t created any tables yet, so what are all of these? PG uses itself to store lots of stats and other data to do its job in 2 schemas called information_schema
and pg_catalog
.
Let’s create a table.
CREATE TABLE matter(
id SERIAL PRIMARY KEY,
name VARCHAR(255),
city VARCHAR(255),
info VARCHAR(255)
);
Running ls -la | wc -l
now shows that I have 300 files. If I run ls -lo
, I get an output like this:
-rw------- 1 postgres 8192 Feb 15 13:15 3603_vm
-rw------- 1 postgres 16384 Feb 15 13:15 3604
-rw------- 1 postgres 16384 Feb 15 13:15 3605
-rw------- 1 postgres 16384 Feb 15 13:15 3606
-rw------- 1 postgres 16384 Feb 15 13:15 3607
-rw------- 1 postgres 16384 Feb 15 13:15 3608
-rw------- 1 postgres 32768 Feb 15 13:15 3609
-rw------- 1 postgres 16384 Feb 15 13:15 3712
-rw------- 1 postgres 8192 Feb 15 13:15 3764
Notice that all of the file sizes are divisible by 8192 (8KB) because when PG writes data, it does so in 8KB blocks, which they call “pages.” We’ll get into why that’s important when talking about table structures later in this book.
To find the specific files used for the table we just created, we can run a query or use the command tool similar to finding the database’s OID.
Connect to the millis database and run the query: SELECT pg_relation_filepath('matter');
Use the command-line tool: oid2name -d millis -t matter
, which will return something like:
From database "millis":
Filenode Table Name
----------------------
24638 matter
Your Filenode
value will be different. Since we haven’t inserted anything, if we look at the file via ls -lo 24638
, it’s empty:
-rw------- 1 postgres 0 Feb 15 13:27 24638
Let’s insert a row by running the query:
INSERT INTO matter (name, city, info) VALUES ('Brian', 'Salt Lake City', 'abcdefghijklmnopqrstuvwxyz');
Now the file has a page written to it. Instead of having 0 bytes, it now contains 8KB of data.
-rw------- 1 postgres 8192 Feb 15 13:27 24638
We can view the contents of this file using the command line tool hexdump
. If hexdump
is unavailable, use your package manager to install it.
By running, hexdump -C 24638
, we see the following:
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00002000
But wait! That’s just a bunch of empty data. Where’s the row we inserted?
Postgres doesn’t immediately write data to these files. Instead, it waits for what it calls a checkpoint
, which happens by default every 5 minutes. So instead of waiting for PG to get around to it, we can force it to happen by simply running checkpoint
as a query.
psql
psql (11.4, server 10.10)
Type "help" for help.
postgres=# \c millis
psql (11.4, server 10.10)
You are now connected to database "millis" as user "postgres".
millis=# checkpoint;
CHECKPOINT
millis=# \q
Now by running hexdump -C 24638
, we see our data. However, if you’re on a mobile device, you may need to scroll to the right to see the text values.
00000000 00 00 00 00 a0 94 af 01 00 00 00 00 1c 00 b0 1f |................|
00000010 00 20 04 20 00 00 00 00 b0 9f 98 00 00 00 00 00 |. . ............|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001fb0 56 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |V...............|
00001fc0 01 00 04 00 02 08 18 00 01 00 00 00 0d 42 72 69 |.............Bri|
00001fd0 61 6e 1f 53 61 6c 74 20 4c 61 6b 65 20 43 69 74 |an.Salt Lake Cit|
00001fe0 79 37 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e |y7abcdefghijklmn|
00001ff0 6f 70 71 72 73 74 75 76 77 78 79 7a 00 00 00 00 |opqrstuvwxyz....|
00002000
Next, we’ll talk about how Postgres performs updates and deletes.