Full Text Search

Full Text Search (FTS) enables efficient searching of natural language text within a Database, going far beyond simple LIKE or regex pattern matching. PostgreSQL has built-in FTS support that handles tokenization, stemming, ranking, and indexing — no external search engine required.

Why Not LIKE or Regex?

  • LIKE '%keyword%' forces a full sequential scan — cannot use B-Tree indexes
  • No understanding of language — running won’t match run
  • No relevance ranking — all matches are treated equally
  • No stop-word filtering — the, a, is pollute results

Core Concepts

tsvector

A processed document — text broken into lexemes (normalized tokens). Stop words removed, words stemmed.

SELECT to_tsvector('english', 'The quick brown foxes jumped over lazy dogs');
-- 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

tsquery

A search query with boolean operators (& AND, | OR, ! NOT, <-> FOLLOWED BY).

SELECT to_tsquery('english', 'quick & fox');
-- 'quick' & 'fox'

Matching

The @@ operator matches a tsvector against a tsquery:

SELECT * FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'database & performance');

Indexing for Performance

Without an Index, FTS requires computing to_tsvector for every row on every query. Two index types support FTS:

GIN Index

The preferred FTS index. Exact matches, fast lookups. Slower to build and update.

CREATE INDEX idx_articles_fts ON articles USING GIN (to_tsvector('english', body));

GiST Index

Lossy — may return false positives that require re-checking. Faster to build, smaller, but slower queries. Better for frequently-updated tables.

Stored tsvector Column

For best performance, store the pre-computed tsvector in a dedicated column and index that:

ALTER TABLE articles ADD COLUMN search_vector tsvector;
UPDATE articles SET search_vector = to_tsvector('english', title || ' ' || body);
CREATE INDEX idx_search ON articles USING GIN (search_vector);

Keep it in sync with a trigger:

CREATE TRIGGER update_search_vector
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
    tsvector_update_trigger(search_vector, 'pg_catalog.english', title, body);

Ranking Results

ts_rank scores results by relevance:

SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'database & scaling') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

The <-> operator matches words in sequence:

SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'database <-> management');

References