liteJq: Bringing jq to SQLite for More Powerful Analytics

23 Feb 2024

SQLite, known for its simplicity as a self-contained, file-based database, has been a staple in applications ranging from small gadgets to massive web browsers. However, dealing with JSON data in SQLite, while possible, has always been a bit clunky and less intuitive, especially when comparing the capabilities to more complex systems like PostgreSQL with its JSONB data type.

For complex and nested documents, for example, retrieving some analytical results, one has to use a sequence of json_extract that are hard to understand and are syntactically unique to SQLite.

On the other hand, jq is a tremendously popular and powerful tool to query JSON data. In fact, it's a full-fledged DSL language in itself. I decided to bring these two closer together through liteJq

liteJq is an SQLite extension written in C, that embeds the standard jq compiler. It exposes a jq(text, text) function that can be used to query JSON data already stored in SQLite, but using the jq language instead.

Let's see it in action:

Let's start by creating a SQLite database of movie data.

curl -sL "https://raw.githubusercontent.com/Florents-Tselai/liteJQ/main/data/movies.sql" |\
 sqlite3 movies.db -

We have a very simple schema:

CREATE TABLE [movies] (
   "d" TEXT -- json
);

Column d holds json values in plain text like this.

{
    "title": "The Corn Is Green",
    "year": 1945,
    "cast": [
      "Bette Davis",
      "Joan Lorring",
      "John Dall"
    ],
    "genres": [
      "Drama"
    ],
    "href": "The_Corn_Is_Green_(1945_film)",
    "extract": "The Corn Is Green is a 1945 American drama film starring Bette Davis as a schoolteacher determined to bring education to a Welsh coal mining town despite great opposition. It was adapted from the 1938 play of the same name by Emlyn Williams, which originally starred Ethel Barrymore.",
    "thumbnail": "https://upload.wikimedia.org/wikipedia/en/thumb/b/bf/The-corn-is-green-poster.jpg/320px-The-corn-is-green-poster.jpg",
    "thumbnail_width": 320,
    "thumbnail_height": 248
  }

First, start by building liteJq by following the instructions:

In any session, you should load the extension first after, like this:

.load ./litejq

Then you can start doing jq magic. Let's see some example queries.

List all movie titles

select jq(d, '.title')
from movies;

To find movies released after a specific year, for example, 1980

select jq(d, '{title: .title, year: .year}')
from movies
where jq(d, '.year > 1980');

The above query is equivalent to this one

select jq(d, '{title: .title, year: .year}')
from movies
where jq(d, '.year') > 1980;

The difference between the two queries above, is who handles the predicate: jq or SQLite.

Extract Movies with Specific Keywords in Extract

select jq(d, '.extract')
from movies
where jq(d, '.extract | contains("silent")');

Filter movies by a specific genre (e.g., Drama)

select jq(d, '{title: .title, year: .year, genres: .genres}')
from movies
where jq(d, '.genres[] == "Drama"');

Filter movies where "Joan Lorring" and "John Dall" played together

select jq(d, '{title: .title, year: .year, cast: .cast}')
from movies
where jq(d, '.cast | contains(["Joan Lorring", "John Dall"])');

The true potential of such an approach though can only be realized when "surgical" jq filters are combined with complex SQL queries.

I believe, and I've written before that the future of databases includes having more specialized DSLs within SQL queries. In fact, that's the second attempt I've made to embed jq in a database. If this looks interesting to you, you may be interested in pgJQ: a Postgres extension that embeds jq.