In the previous sections we’ve learned how Postgres handles inserts, updates, and deletes in its data files. In order to maintain ACID compliance, it doesn’t actually remove any data at the time of the query. Instead, it marks updated and deleted tuples as “dead” and appends the newly updated tuple like it would an insert.
On a busy table with a lot of updates and deletes, this file will get really big. Beyond the obvious drawback of using a lot more hard drive space than is needed, it will also slow down future queries that have to scan this file looking for the correct tuples to return.
Fortunately, PG collects a lot of stats on each table such as how many times the table has been scanned since the last time the table has been analyzed. It also shows things like the number of live vs. dead tuples. Calling a tuple dead or dirty is the same thing.
To see these stats1, run the query select * from pg_stat_user_tables;
or you can look at specific metrics such as with the query select relname, n_live_tup, n_dead_tup from pg_stat_user_tables;
. Earlier in the book, we talked about how a “relation” in Relational Algebra is a table. This is why PG uses the column “relname” in this view for the name of the table.
Running the query above, gives us this result:
millis=# select relname, n_live_tup, n_dead_tup from pg_stat_user_tables;
relname | n_live_tup | n_dead_tup
---------+------------+------------
matter | 2 | 2
(1 row)
In the previous 2 sections, we ran 3 inserts, 1 update, and 1 delete. This left 2 live tuples and 2 are now considered dead.
A vacuum is a separate built in process that sets dead tuples as available for future reuse. How often it’s run is controlled in the postgresql.conf file. For example, autovacuum_vacuum_threshold
tells PG to run a vacuum automatically when a specified number of dead tuples are reached.2
Before we go further, the existing “matter” table’s data file looks like this:
00000000 00 00 00 00 f8 46 b5 01 00 00 00 00 28 00 00 1f |.....F......(...|
00000010 00 20 04 20 73 10 00 00 b0 9f 98 00 78 9f 66 00 |. . s.......x.f.|
00000020 38 9f 78 00 00 9f 70 00 00 00 00 00 00 00 00 00 |8.x...p.........|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001f00 73 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |s...............|
00001f10 04 00 04 80 02 28 18 00 02 00 00 00 0f 52 6f 62 |.....(.......Rob|
00001f20 65 72 74 11 43 68 69 63 61 67 6f 1b 6d 6f 76 65 |ert.Chicago.move|
00001f30 64 20 63 69 74 69 65 73 72 10 00 00 00 00 00 00 |d citiesr.......|
00001f40 00 00 00 00 00 00 00 00 03 00 04 00 02 08 18 00 |................|
00001f50 03 00 00 00 0d 4a 61 6d 65 73 15 42 6f 75 6e 74 |.....James.Bount|
00001f60 69 66 75 6c 21 69 6d 70 6f 72 74 61 6e 74 20 73 |iful!important s|
00001f70 74 75 66 66 00 00 00 00 71 10 00 00 73 10 00 00 |tuff....q...s...|
00001f80 00 00 00 00 00 00 00 00 04 00 04 40 02 01 18 00 |...........@....|
00001f90 02 00 00 00 0f 52 6f 62 65 72 74 0d 50 61 72 69 |.....Robert.Pari|
00001fa0 73 15 73 6f 6d 65 20 69 6e 66 6f 00 00 00 00 00 |s.some info.....|
00001fb0 70 10 00 00 74 10 00 00 00 00 00 00 00 00 00 00 |p...t...........|
00001fc0 01 00 04 20 02 01 18 00 01 00 00 00 0d 42 72 69 |... .........Bri|
00001fd0 61 6e 1f 53 61 6c 74 20 4c 61 6b 65 20 43 69 74 |an.Salt Lake Cit|
00001fe0 79 37 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e |y7abcdefghijklmn|
00001ff0 6f 70 71 72 73 74 75 76 77 78 79 7a 00 00 00 00 |opqrstuvwxyz....|
00002000
In addition to waiting for PG to run a vacuum automatically, we can run it explicitly. To do this for the “matter” table, run the query VACUUM matter;
and CHECKPOINT;
Running stats again now shows:
millis=# select relname, n_live_tup, n_dead_tup from pg_stat_user_tables;
relname | n_live_tup | n_dead_tup
---------+------------+------------
matter | 2 | 0
(1 row)
And running hexdump -C 24648
now shows:
00000000 00 00 00 00 d0 49 b5 01 00 00 05 00 28 00 88 1f |.....I......(...|
00000010 00 20 04 20 00 00 00 00 00 00 00 00 04 00 01 00 |. . ............|
00000020 c0 9f 78 00 88 9f 70 00 00 00 00 00 00 00 00 00 |..x...p.........|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001f00 73 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |s...............|
00001f10 04 00 04 80 02 29 18 00 02 00 00 00 0f 52 6f 62 |.....).......Rob|
00001f20 65 72 74 11 43 68 69 63 61 67 6f 1b 6d 6f 76 65 |ert.Chicago.move|
00001f30 64 20 63 69 74 69 65 73 72 10 00 00 00 00 00 00 |d citiesr.......|
00001f40 00 00 00 00 00 00 00 00 03 00 04 00 02 09 18 00 |................|
00001f50 03 00 00 00 0d 4a 61 6d 65 73 15 42 6f 75 6e 74 |.....James.Bount|
00001f60 69 66 75 6c 21 69 6d 70 6f 72 74 61 6e 74 20 73 |iful!important s|
00001f70 74 75 66 66 00 00 00 00 71 10 00 00 73 10 00 00 |tuff....q...s...|
00001f80 00 00 00 00 00 00 00 00 73 10 00 00 00 00 00 00 |........s.......|
00001f90 00 00 00 00 00 00 00 00 04 00 04 80 02 29 18 00 |.............)..|
00001fa0 02 00 00 00 0f 52 6f 62 65 72 74 11 43 68 69 63 |.....Robert.Chic|
00001fb0 61 67 6f 1b 6d 6f 76 65 64 20 63 69 74 69 65 73 |ago.moved cities|
00001fc0 72 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |r...............|
00001fd0 03 00 04 00 02 09 18 00 03 00 00 00 0d 4a 61 6d |.............Jam|
00001fe0 65 73 15 42 6f 75 6e 74 69 66 75 6c 21 69 6d 70 |es.Bountiful!imp|
00001ff0 6f 72 74 61 6e 74 20 73 74 75 66 66 00 00 00 00 |ortant stuff....|
00002000
The page has been altered. The dead tuples are no longer there and the live tuples have been moved; however, the bytes that were previously used still contain some unused data. Let’s see what happens when we insert another record.
INSERT INTO matter (name, city, info) VALUES ('William', 'San Francisco', 'really important data');
and CHECKPOINT;
00000000 00 00 00 00 c8 88 b5 01 00 00 01 00 28 00 40 1f |............(.@.|
00000010 00 20 04 20 00 00 00 00 40 9f 90 00 04 00 01 00 |. . ....@.......|
00000020 c0 9f 78 00 88 9f 70 00 00 00 00 00 00 00 00 00 |..x...p.........|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001f00 73 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |s...............|
00001f10 04 00 04 80 02 29 18 00 02 00 00 00 0f 52 6f 62 |.....).......Rob|
00001f20 65 72 74 11 43 68 69 63 61 67 6f 1b 6d 6f 76 65 |ert.Chicago.move|
00001f30 64 20 63 69 74 69 65 73 72 10 00 00 00 00 00 00 |d citiesr.......|
00001f40 75 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |u...............|
00001f50 01 00 04 00 02 08 18 00 04 00 00 00 11 57 69 6c |.............Wil|
00001f60 6c 69 61 6d 1d 53 61 6e 20 46 72 61 6e 63 69 73 |liam.San Francis|
00001f70 63 6f 2d 72 65 61 6c 6c 79 20 69 6d 70 6f 72 74 |co-really import|
00001f80 61 6e 74 20 64 61 74 61 73 10 00 00 00 00 00 00 |ant datas.......|
00001f90 00 00 00 00 00 00 00 00 04 00 04 80 02 29 18 00 |.............)..|
00001fa0 02 00 00 00 0f 52 6f 62 65 72 74 11 43 68 69 63 |.....Robert.Chic|
00001fb0 61 67 6f 1b 6d 6f 76 65 64 20 63 69 74 69 65 73 |ago.moved cities|
00001fc0 72 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |r...............|
00001fd0 03 00 04 00 02 09 18 00 03 00 00 00 0d 4a 61 6d |.............Jam|
00001fe0 65 73 15 42 6f 75 6e 74 69 66 75 6c 21 69 6d 70 |es.Bountiful!imp|
00001ff0 6f 72 74 61 6e 74 20 73 74 75 66 66 00 00 00 00 |ortant stuff....|
00002000
PG looks for a reusable space big enough to contain the new tuple and inserts it there. This way, dead tuples are reclaimed by new writes.
So far, we’ve been using the non-blocking safe vacuum process. This is natural and happens all the time on a well functioning PG server; however, because new writes are constantly going over the top of obsolete data, this can cause fragmentation. Many small bits of unclaimed space that is too small for another tuple to fit may collect between live tuples. This wastes space on the hard drive and makes table scans take longer.
The solution to this is running VACUUM FULL
on the table. This completely writes the file from scratch removing any fragmentation. The table is locked during this time preventing other updates from happening. On a large table, this can take minutes or hours so it’s best to run during scheduled maintenance time.
Let’s see how this affects our example. Run VACUUM FULL matter;
and CHECKPOINT;
Then let’s check out the file.
hexdump -C 24658
hexdump: 24658: No such file or directory
hexdump: 24658: Bad file descriptor
Wait? Where did that file go? PG literally wrote a new file from scratch and deleted the existing one so we need to find out what the new file name is.
> oid2name -d millis -t matter
From database "millis":
Filenode Table Name
----------------------
24666 matter
> hexdump -C 24666
00000000 00 00 00 00 20 0d b7 01 00 00 00 00 24 00 40 1f |.... .......$.@.|
00000010 00 20 04 20 00 00 00 00 b8 9f 90 00 78 9f 78 00 |. . ........x.x.|
00000020 40 9f 70 00 00 00 00 00 00 00 00 00 00 00 00 00 |@.p.............|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001f40 73 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |s...............|
00001f50 03 00 04 00 02 2b 18 00 02 00 00 00 0f 52 6f 62 |.....+.......Rob|
00001f60 65 72 74 11 43 68 69 63 61 67 6f 1b 6d 6f 76 65 |ert.Chicago.move|
00001f70 64 20 63 69 74 69 65 73 72 10 00 00 00 00 00 00 |d citiesr.......|
00001f80 00 00 00 00 00 00 00 00 02 00 04 00 02 0b 18 00 |................|
00001f90 03 00 00 00 0d 4a 61 6d 65 73 15 42 6f 75 6e 74 |.....James.Bount|
00001fa0 69 66 75 6c 21 69 6d 70 6f 72 74 61 6e 74 20 73 |iful!important s|
00001fb0 74 75 66 66 00 00 00 00 75 10 00 00 00 00 00 00 |tuff....u.......|
00001fc0 00 00 00 00 00 00 00 00 01 00 04 00 02 0b 18 00 |................|
00001fd0 04 00 00 00 11 57 69 6c 6c 69 61 6d 1d 53 61 6e |.....William.San|
00001fe0 20 46 72 61 6e 63 69 73 63 6f 2d 72 65 61 6c 6c | Francisco-reall|
00001ff0 79 20 69 6d 70 6f 72 74 61 6e 74 20 64 61 74 61 |y important data|
00002000
The file now only contains live data.