In the previous section, we learned where PG stores data. We also learned how databases and tables are broken up using an internal OID to name mapping so that all of the file and directory names are simply integers on disk.
We’re going to continue with our example and show how updates and deletes are written to the file.
At this point, if you don’t have a sample database called millis with a sample table named matter, go back to the last section. You will also need to know the oid2name mapping to view the correct file via hexdump
. The previous section will give you the instructions needed to follow along.
Returning to the example in the previous section, we ran hexdump
on the “matter” table using the command hexdump -C 24638
. Your filename (i.e. 24638) will be different.
This will look something like:
00000000 00 00 00 00 58 98 b2 01 00 00 00 00 1c 00 b0 1f |....X...........|
00000010 00 20 04 20 00 00 00 00 b0 9f 98 00 00 00 00 00 |. . ............|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001fb0 64 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |d...............|
00001fc0 01 00 04 00 02 09 18 00 01 00 00 00 0d 42 72 69 |.............Bri|
00001fd0 61 6e 1f 53 61 6c 74 20 4c 61 6b 65 20 43 69 74 |an.Salt Lake Cit|
00001fe0 79 37 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e |y7abcdefghijklmn|
00001ff0 6f 70 71 72 73 74 75 76 77 78 79 7a 00 00 00 00 |opqrstuvwxyz....|
00002000
I’m not going to pretend to know what the value of every bit means, and these could be different from version to version. However, by performing some basic actions, we can look at how the data in this file changes.
Let’s insert another row and force a checkpoint.
INSERT INTO matter (name, city, info) VALUES ('Robert', 'Paris', 'some info');
and CHECKPOINT;
Then display the contents again by running hexdump -C 24638
. Now the file looks like this:
00000000 00 00 00 00 38 9b b2 01 00 00 00 00 20 00 78 1f |....8....... .x.|
00000010 00 20 04 20 00 00 00 00 b0 9f 98 00 78 9f 66 00 |. . ........x.f.|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001f70 00 00 00 00 00 00 00 00 65 10 00 00 00 00 00 00 |........e.......|
00001f80 00 00 00 00 00 00 00 00 02 00 04 00 02 08 18 00 |................|
00001f90 02 00 00 00 0f 52 6f 62 65 72 74 0d 50 61 72 69 |.....Robert.Pari|
00001fa0 73 15 73 6f 6d 65 20 69 6e 66 6f 00 00 00 00 00 |s.some info.....|
00001fb0 64 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |d...............|
00001fc0 01 00 04 00 02 09 18 00 01 00 00 00 0d 42 72 69 |.............Bri|
00001fd0 61 6e 1f 53 61 6c 74 20 4c 61 6b 65 20 43 69 74 |an.Salt Lake Cit|
00001fe0 79 37 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e |y7abcdefghijklmn|
00001ff0 6f 70 71 72 73 74 75 76 77 78 79 7a 00 00 00 00 |opqrstuvwxyz....|
00002000
We can see Robert’s record is now listed. We can also see that it was inserted in the same page because the data from both rows don’t exceed 8KB. This can also be deduced because both tuples are between the 00000000 and 00002000 line numbers.
Let’s insert one more row.
INSERT INTO matter (name, city, info) VALUES ('James', 'Bountiful', 'important stuff');
and CHECKPOINT;
This gives us:
00000000 00 00 00 00 58 9e b2 01 00 00 00 00 24 00 38 1f |....X.......$.8.|
00000010 00 20 04 20 00 00 00 00 b0 9f 98 00 78 9f 66 00 |. . ........x.f.|
00000020 38 9f 78 00 00 00 00 00 00 00 00 00 00 00 00 00 |8.x.............|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001f30 00 00 00 00 00 00 00 00 66 10 00 00 00 00 00 00 |........f.......|
00001f40 00 00 00 00 00 00 00 00 03 00 04 00 02 08 18 00 |................|
00001f50 03 00 00 00 0d 4a 61 6d 65 73 15 42 6f 75 6e 74 |.....James.Bount|
00001f60 69 66 75 6c 21 69 6d 70 6f 72 74 61 6e 74 20 73 |iful!important s|
00001f70 74 75 66 66 00 00 00 00 65 10 00 00 00 00 00 00 |tuff....e.......|
00001f80 00 00 00 00 00 00 00 00 02 00 04 00 02 08 18 00 |................|
00001f90 02 00 00 00 0f 52 6f 62 65 72 74 0d 50 61 72 69 |.....Robert.Pari|
00001fa0 73 15 73 6f 6d 65 20 69 6e 66 6f 00 00 00 00 00 |s.some info.....|
00001fb0 64 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |d...............|
00001fc0 01 00 04 00 02 09 18 00 01 00 00 00 0d 42 72 69 |.............Bri|
00001fd0 61 6e 1f 53 61 6c 74 20 4c 61 6b 65 20 43 69 74 |an.Salt Lake Cit|
00001fe0 79 37 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e |y7abcdefghijklmn|
00001ff0 6f 70 71 72 73 74 75 76 77 78 79 7a 00 00 00 00 |opqrstuvwxyz....|
00002000
And selecting this data from the database gives us the info we expect:
millis=# select * from matter;
id | name | city | info
----+--------+----------------+----------------------------
1 | Brian | Salt Lake City | abcdefghijklmnopqrstuvwxyz
2 | Robert | Paris | some info
3 | James | Bountiful | important stuff
(3 rows)
Now let’s see what happens when Robert moves from Paris to Chicago.
UPDATE matter SET city = 'Chicago', info = 'moved cities' WHERE id = 2;
and CHECKPOINT;
00000000 00 00 00 00 88 a1 b2 01 00 00 00 00 28 00 00 1f |............(...|
00000010 00 20 04 20 67 10 00 00 b0 9f 98 00 78 9f 66 00 |. . g.......x.f.|
00000020 38 9f 78 00 00 9f 70 00 00 00 00 00 00 00 00 00 |8.x...p.........|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001f00 67 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |g...............|
00001f10 04 00 04 80 02 28 18 00 02 00 00 00 0f 52 6f 62 |.....(.......Rob|
00001f20 65 72 74 11 43 68 69 63 61 67 6f 1b 6d 6f 76 65 |ert.Chicago.move|
00001f30 64 20 63 69 74 69 65 73 66 10 00 00 00 00 00 00 |d citiesf.......|
00001f40 00 00 00 00 00 00 00 00 03 00 04 00 02 09 18 00 |................|
00001f50 03 00 00 00 0d 4a 61 6d 65 73 15 42 6f 75 6e 74 |.....James.Bount|
00001f60 69 66 75 6c 21 69 6d 70 6f 72 74 61 6e 74 20 73 |iful!important s|
00001f70 74 75 66 66 00 00 00 00 65 10 00 00 67 10 00 00 |tuff....e...g...|
00001f80 00 00 00 00 00 00 00 00 04 00 04 40 02 01 18 00 |...........@....|
00001f90 02 00 00 00 0f 52 6f 62 65 72 74 0d 50 61 72 69 |.....Robert.Pari|
00001fa0 73 15 73 6f 6d 65 20 69 6e 66 6f 00 00 00 00 00 |s.some info.....|
00001fb0 64 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |d...............|
00001fc0 01 00 04 00 02 09 18 00 01 00 00 00 0d 42 72 69 |.............Bri|
00001fd0 61 6e 1f 53 61 6c 74 20 4c 61 6b 65 20 43 69 74 |an.Salt Lake Cit|
00001fe0 79 37 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e |y7abcdefghijklmn|
00001ff0 6f 70 71 72 73 74 75 76 77 78 79 7a 00 00 00 00 |opqrstuvwxyz....|
00002000
Robert is in there twice! If we run the select again, we only see the new changes albeit out of order according to their id values.
millis=# select * from matter;
id | name | city | info
----+--------+----------------+----------------------------
1 | Brian | Salt Lake City | abcdefghijklmnopqrstuvwxyz
3 | James | Bountiful | important stuff
2 | Robert | Chicago | moved cities
(3 rows)
Every time an update occurs, the entire tuple is written again, and the existing tuple is marked as “dead”.1 This way, any transactions that were running prior to this change will return the older version. This aids in the atomicity, consistency, and isolation parts of being ACID compliant.
The reason the select statement returned rows out of id order is because this is the new order of the clean rows in the file.
Now let’s delete a row and see what that looks like.
DELETE FROM matter where id = 1
and CHECKPOINT
.
00000000 00 00 00 00 f8 a3 b2 01 00 00 00 00 28 00 00 1f |............(...|
00000010 00 20 04 20 67 10 00 00 b0 9f 98 00 78 9f 66 00 |. . g.......x.f.|
00000020 38 9f 78 00 00 9f 70 00 00 00 00 00 00 00 00 00 |8.x...p.........|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001f00 67 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |g...............|
00001f10 04 00 04 80 02 29 18 00 02 00 00 00 0f 52 6f 62 |.....).......Rob|
00001f20 65 72 74 11 43 68 69 63 61 67 6f 1b 6d 6f 76 65 |ert.Chicago.move|
00001f30 64 20 63 69 74 69 65 73 66 10 00 00 00 00 00 00 |d citiesf.......|
00001f40 00 00 00 00 00 00 00 00 03 00 04 00 02 09 18 00 |................|
00001f50 03 00 00 00 0d 4a 61 6d 65 73 15 42 6f 75 6e 74 |.....James.Bount|
00001f60 69 66 75 6c 21 69 6d 70 6f 72 74 61 6e 74 20 73 |iful!important s|
00001f70 74 75 66 66 00 00 00 00 65 10 00 00 67 10 00 00 |tuff....e...g...|
00001f80 00 00 00 00 00 00 00 00 04 00 04 40 02 05 18 00 |...........@....|
00001f90 02 00 00 00 0f 52 6f 62 65 72 74 0d 50 61 72 69 |.....Robert.Pari|
00001fa0 73 15 73 6f 6d 65 20 69 6e 66 6f 00 00 00 00 00 |s.some info.....|
00001fb0 64 10 00 00 68 10 00 00 00 00 00 00 00 00 00 00 |d...h...........|
00001fc0 01 00 04 20 02 01 18 00 01 00 00 00 0d 42 72 69 |... .........Bri|
00001fd0 61 6e 1f 53 61 6c 74 20 4c 61 6b 65 20 43 69 74 |an.Salt Lake Cit|
00001fe0 79 37 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e |y7abcdefghijklmn|
00001ff0 6f 70 71 72 73 74 75 76 77 78 79 7a 00 00 00 00 |opqrstuvwxyz....|
00002000
The record with the name “Brian” still exists, but there was a subtle change. This tuple is now marked as “dead” as well. Running the select again returns what we expect.
millis=# select * from matter;
id | name | city | info
----+--------+-----------+-----------------
3 | James | Bountiful | important stuff
2 | Robert | Chicago | moved cities
(2 rows)
On a table with lots of updates and deletes, the file could get really big really fast. In the next section, we’ll talk about how Postgres cleans up after itself.
Note, the terms “dead” and “dirty” are interchangeable in this context. ↩