A trigram is a three letter n-gram, which are contiguous sequences of n items from a string ignoring non-alphanumeric characters.
For example, for the name “Brian Broderick”, the set of trigrams are: {b,br,bri,ria,ian,an,bro,rod,ode,der,eri,ric,ick,ck}
Notice the space between the words is ignored and a trigram is not created with letters from both names. (i.e. nbr
)
Trigrams in Postgres are available through an extension, which provides a new set of functions. One of the functions makes it possible to use trigrams within GIN or GIST indices. First, install the extension by running the following SQL command.
CREATE EXTENSION pg_trgm;
To test that it’s working, run the following command, which should return the results listed above.
SELECT show_trgm('Brian Broderick');
The Trigram extension comes with some additional functions; for example, it can return similarity between two strings. This can be used as a really simple full text search, but there are better options for this.
The real value of trigram optimized GIN or GIST indices is that they can be used in LIKE
and ILIKE
queries where there’s a left-bound wildcard. For example:
SELECT * FROM customers WHERE phone LIKE '%555%';
In this example, a b-tree wouldn’t be used, but a trigram could.
To create the index, run something like:
CREATE INDEX CONCURRENTLY name_of_idx ON my_table USING GIN (my_column gin_trgm_ops);’)
or
CREATE INDEX CONCURRENTLY name_of_idx ON my_table USING GIST (my_column gist_trgm_ops);’)
The special parts are the functions gin_trgm_ops
and gist_trgm_ops
. This breaks up text blocks into trigrams.
In Postgres, version 11 or older, a trigram would not be used in conjunction with a UUID field. This was fixed in PG12.
Because each string is broken up into 3 letter sequences, the overall disk space and memory needed is up to 3 times greater than a b-tree.