My favorite PostgreSQL 18 Feature: Virtual Generated Columns

Async I/O, UUID v7, and post-upgrade statistics will probably steal the show when it comes to Postgres 18. My personal favorite feature of the upcoming release, though, is virtual generated columns.

Generated columns are one of those features that can save you from writing a lot of boilerplate code and make your life easier. Instead of repeating the same expressions across views, queries, or triggers, you can let the database manage them for you. They encapsulate logic close to the data, reduce duplication, and make schemas easier to reason about.

Generated columns come in two flavors: stored and virtual.

Stored generated columns are computed once on insert/update and are persisted on disk (taking up storage). They can then be indexed like a “real” column. They’re great for expensive expressions you don’t want to recalculate every time.

Virtual generated columns (new in PG 18!) are not stored on disk. They’re computed on the fly when queried. They’re ideal for lightweight expressions or when you don’t want to bloat your table size.

Considering the following example:

create table users (
    id serial primary key,
    name text not null,
    -- stored generated column: persists value on disk, indexable
    name_upper text
        generated always as (upper(name)) stored,
    -- virtual generated column: computed only when queried
    name_lower text
        generated always as (lower(name)) virtual
);

insert into users (name) values ('Florents'), ('Αθηνά'), ('postgres');
select id, name, name_upper, name_lower from users;

id |   name   | name_upper | name_lower 
----+----------+------------+------------
  1 | Florents | FLORENTS   | florents
  2 | Αθηνά    | ΑΘΗΝΆ      | αθηνά
  3 | postgres | POSTGRES   | postgres

Here name_upper is stored and written to disk, but name_lower is virtual and it’s computed on the fly.

Generated columns are interesting because they emulate the style of reactive programming that made Microsoft Excel one of the most popular, effective, and enduring pieces of software ever written. You define a source column (independent variable), and some generated columns (dependent variables) are a function of that.

In general, I’ve found them to be very useful as a tool because they free some brain power from the database’s maintenance, which can instead be invested in proper database design and normalization.

A practical example is full-text search. Imagine you want to support search across multiple languages or text configurations. You could store three different generated columns, each using a different text search config:

create table docs (
    id serial primary key,
    body text not null,
    body_fts_simple tsvector
        generated always as (to_tsvector('simple', body)) stored,
    body_fts_en tsvector
        generated always as (to_tsvector('english', body)) stored,
    body_fts_el tsvector
        generated always as (to_tsvector('greek', body)) stored
);

Now you can create indexes on each tsvector column and query depending on the user’s language:

create index on docs using gin (body_fts_simple);
create index on docs using gin (body_fts_en);
create index on docs using gin (body_fts_el);

This way, you avoid triggers, keep your schema declarative, and ensure that Your FTS columns are always consistent with the base text.

One could, of course, bypass storing the data and create an index on the expression, But I’ve found that looking at individual tokens and scores is too often necessary to debug search results and pre-processing steps.

Subsequent updates on the source columns cause a new evaluation of the generated ones, which are then updated according to the function-expression.

You can implement the same logic by using triggers, but good luck maintaining your database’s integrity and your own sanity when scaling that.

Generated columns also allow you to build your tables lego-style, one column at a time. In the wild, I’ve seen generated columns used as a very popular way of “flattening” JSON documents in Postgres. Notice that stored generated columns have been available since PG 12. Since then, a lot of new features have been added to Postgres’ JSON infrastructure: from more elaborate jsonpath expressions to JSON_TABLE functions.

When such JSON facilities weren’t available, or people felt uncomfortable learning advanced jsonpath and preferred to stick to simple operations full of jsonb -> 'k1' -> 'k2', generated columns were a very convenient alternative.

With stored generated columns this was convenient but came at the cost of extra storage, since JSON documents can be quite wordy. For elements that you only ever project in a SELECT clause, this bloat doesn’t buy you much. Virtual generated columns, on the other hand, let you expose those JSON fields as regular columns without duplicating data on disk. You still get a cleaner query surface, but with zero storage overhead.

You can read the thread discussing the original patch from Peter Eisentraut for some more details on the inner workings.

One important tradeoff is performance. Stored columns add some overhead on writes, since every insert or update has to recompute and persist the value. The benefit is that reads are much faster and can leverage indexes directly. Virtual columns do the opposite: they keep writes cheap but make reads do the work at query time. Think of stored as materialized and virtual as on-demand.

Another practical consideration is schema evolution. Adding a virtual column to a large table is instant, since nothing gets written to disk. Adding a stored column, however, requires backfilling values and may trigger a full table rewrite. That makes virtual columns especially attractive when you want to experiment quickly without locking up your production tables.

There are several limitations on generated columns that are fairly obvious, in that they prevent your database from becoming a spaghetti of events. They’re all listed in the documentation.

There are also some subtle security implications with on-the-fly evaluation, which are discussed in this thread. If your database relies heavily on custom functions or user-defined types, it’s worth reading carefully. Virtual columns can surface surprising behaviors, and without proper planning, attempts to introduce them may lead to some rather spectacular hiccups.