skies.dev

Output Relationships as Nested JSON with PostgreSQL

4 min read

Defining the data model

Our blogging app's database defines the following tables:

  • posts contains the article
  • categories contains a blog category
  • post_categories defines the categories of each post.

The entity relationship diagram of this many-to-many relation looks as follows:

Posts can be associated with many categories. Categories can be associated with many posts.
Posts can be associated with many categories. Categories can be associated with many posts.

For our GetPosts API, we want to fetch posts with their respective categories using the following structure.

interface Category {
  id: number;
  name: string;
}

interface Post {
  id: number;
  body: string;
  categories: Category[];
}

The tricky part here is that a normalized SQL database returns flat data, but our API requires categories to be a nested property of Post.

A naive attempt to query structured data

We could get our desired JSON structure by making multiple queries and building the desired structure on the server, but this is error-prone, tedious, and less performant.

Generally speaking, we can assume the Postgres query optimizer, written in C, will outperform our application code.

Postgres aggregate function to output relationships as nested JSON

To set this up, we need to join posts and categories.

SELECT p.post_id, p.body, c.category_id, c.name
FROM posts p
LEFT JOIN post_categories pc ON p.post_id =  pc.post_id
LEFT JOIN categories c ON c.category_id = pc.category_id

If we run this, we'll get the same post for each of the post's categories. For example:

post_idcategory_id
11
12
13
21
22
31

So if a post has four categories, then this query will return four records for each of it's categories. This isn't exactly what we want because it would require us to wrangle the data server-side to get the JSON structure we want.

What we want is one row for each post, such that the categories are returned as JSON.

post_idcategory_id
1[1, 2, 3]
2[1, 2]
3[1]

For this, we're going to make use of PostgreSQL's jsonb_agg aggregate function.

We'll update our query to group by post_id and aggregate all the categories associated with that post_id, and render them as JSON using jsonb_agg.

SELECT p.post_id, p.body, jsonb_agg(c) as categories
FROM posts p
LEFT JOIN post_categories pc ON p.post_id = pc.post_id
LEFT JOIN categories c ON c.category_id = pc.category_id
GROUP BY p.post_id

When we run this query we get a new categories field with the nested JSON. ๐Ÿš€

There is a slight issue though. When a post has no categories, jsonb_agg will return the singleton array: [null].

How to fix the singleton [null] array with coalesce

For our API, we don't want to have to deal with this edge case. Instead, we'd rather return an empty list [] for posts that don't have any categories.

Here's how we'll modify our query to handle this case:

  1. Filter out any element that is null
  2. Coalesce the result to the empty array []

Here's how it'll look.

SELECT p.post_id,
       p.body,
       coalesce(jsonb_agg(c) FILTER (WHERE c.category_id IS NOT NULL), '[]') as categories
FROM posts p
LEFT JOIN post_categories pc ON p.post_id = pc.post_id
LEFT JOIN categories c ON c.category_id = pc.category_id
GROUP BY p.post_id

When we run this query, we'll get a single record per post, with each post's category as a nested JSON array. If a post doesn't have any categories, it will return [].

Next, we'll adapt this SQL query to the knex.js query builder to perform this query in our application code. Feel free to skip this section is you're not using knex.js in your app.

Use Knex.js to query relationships as structured JSON

For the most part we can use the standard query builder methods. We just need to add a raw SQL statement to do the JSON aggregation piece i.e. coalesce, jsonb_agg, etc.

import knex from 'knex';

async function getPosts() {
  return knex
    .from('posts')
    .leftJoin('post_categories', 'post.post_id', 'post_categories.post_id')
    .leftJoin(
      'categories',
      'categories.category_id',
      'post_categories.category_id',
    )
    .groupBy('post.post_id')
    .select([
      'post.post_id',
      'post.body',
      knex.raw(
        `coalesce(jsonb_agg(category) filter (where category.id is not null), '[]') as categories`,
      ),
    ]);
}

And that's all there is to it. You should now be able to:

  • Understand the pitfalls of translating normalized data to structured JSON
  • Use Postgres aggregate function jsonb_agg to query relationships as structured data
  • Use the knex.js library to perform queries with jsonb_agg

Hey, you! ๐Ÿซต

Did you know I created a YouTube channel? I'll be putting out a lot of new content on web development and software engineering so make sure to subscribe.

(clap if you liked the article)

You might also like