Trigram Index

A trigram is a three letter n-gram, which are contiguous sequences of n items from a string ignoring non-alphanumeric characters.

For example, for the name “Brian Broderick”, the set of trigrams are: {b,br,bri,ria,ian,an,bro,rod,ode,der,eri,ric,ick,ck} Notice the space between the words is ignored and a trigram is not created with letters from both names. (i.e. nbr)

Installing

Trigram indices in Postgres are available through an extension.

CREATE EXTENSION pg_trgm;

To test that it’s working, run the following command, which should return the results listed above.

SELECT show_trgm('Brian Broderick');

Uses

Trigram indices come with some additional functions; for example, it can return similarity between two strings. This can be used as a really simple full text search, but there are better options for this.

The real value of trigram indices is that they can be used in LIKE and ILIKE queries where there’s a left-bound wildcard. For example:

SELECT * FROM customers WHERE phone LIKE '%555%';

In this example, a b-tree wouldn’t be used, but a trigram could.

Limitations

In Postgres, version 11 or older, a trigram would not be used in conjunction with a UUID field. This was fixed in PG12.

Because each string is broken up into 3 letter sequences, the overall disk space and memory needed is up to 3 times greater than a b-tree.