Write Ahead Log (WAL)

The Write Ahead Log or WAL is common across many database and streaming technologies such as Postgres or Kafka, respectively.

It is an append-only set of files that contains either new messages in the case of Kafka, or new changes to data as in the case of relational databases like Postgres. Since this chapter is all about relational databases, we’ll talk about how this is important for Postgres.

The WAL is used both internally and externally. Internally, it’s read when a CHECKPOINT happens. The checkpoint is responsible for writing these changes to the data files. Externally, the WAL is the log in a typical publish/subscribe model. PG publishes to the WAL and other processes are able to subscribe to it to get changes to data.

External subscribers might be Postgres Follower servers or it could be another service such as Kafka Connect, which can use the data for Change Data Capture (CDC) pipelines.

Replication Slots

Typically, when a service subscribes to PG’s WAL, it will do so by creating a Replication Slot by running the query select pg_create_physical_replication_slot('slot_name'); It can then use this slot to get a feed of WAL changes.

The replication slot keeps track of which WAL files the subscriber has acknowledged (ACK). Once all slots have sent an ACK to a particular WAL file, and a CHECKPOINT has occurred, PG is allowed to delete it.

This has the drawback that if a subscriber goes offline, WAL files will never be deleted. This could easily cause the hard drive to completely fill up. To fix, delete the replication slot or bring the subscriber back online. It’s good to monitor the subscriber’s health as well as available hard drive space to avoid bringing the PG server down.