Here I'd like to take a look at a very simple and straightforward example depicting how powerful querying json objects is going to be Postgres, some time after Postgres 10. It's a huge leap forward for querying json and I'm very excited about it.
Imagine the following issue at hand.
Given: a table consists of number
column and a ranges
jsonb column; ranges
holds an array of json objects, where each object has three fields: title
, min
and max
.
Task: select all rows, such that ranges[].min <= number < ranges[].max
exists. In other words, ranges
has an object that that satisfies a condition .min <= number < .max
.
Let's lay out the problem and test data in formal SQL terms.
We have a table my_table
:
CREATE TABLE my_table (
title VARCHAR,
number INTEGER,
ranges JSONB
);
There's some data in the table:
INSERT INTO my_table (title, number, ranges)
VALUES
('bad', 6, '[{"min": 1, "max": 5}, {"min": 7, "max": 10}]'::JSONB),
('good', 7, '[{"min": 1, "max": 5}, {"min": 7, "max": 10}]'::JSONB);
The task is to write a query that will result in only row with title='good'
to be selected.
An obvious solution would be the following:
Here is what such query would look like:
SELECT title,
number,
jsonb_array_elements(ranges) as object
FROM my_table
WHERE number BETWEEN (object->>'min')::integer AND (object->>'max')::integer;
Hmm, but that doesn't seem to work:
ERROR: column "object" does not exist
LINE 5: WHERE number BETWEEN object->>'min' AND object->>'max';
Let's try another approach:
WITH our_data AS (
SELECT title,
number,
jsonb_array_elements(ranges) as object
FROM my_table
)
SELECT *
FROM our_data
WHERE number BETWEEN (object->>'min')::integer AND (object->>'max')::integer;
Okay, that was a little too verbose, but it works!
Let's take a closer look at this solution:
ranges
column is being unfolded, e.g. each object of
the array becomes an SQL row of its own,It works? Yes. But is it optimal? Here's how a query plan looks like using EXPLAIN ANALYSE
:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
CTE Scan on our_data (cost=441.38..3841.38 rows=9444 width=68) (actual time=0.060..0.062 rows=1 loops=1)
Filter: ((number >= ((object ->> 'min'::text))::integer) AND (number <= ((object ->> 'max'::text))::integer))
Rows Removed by Filter: 3
CTE our_data
-> Seq Scan on my_table (cost=0.00..441.38 rows=85000 width=68) (actual time=0.035..0.041 rows=4 loops=1)
Planning time: 0.082 ms
Execution time: 0.518 ms
(7 rows)
Apart from suspiciously big numbers in the EXPLAIN ANALYSE
s result and a gut feeling, it's hard to
confidently say whether or not this approach is fast. So lets try to solve the problem differently.
Update. User simcitymayor from reddit
suggested
to use LATERAL
, let's try that:
FROM my_table m,
LATERAL jsonb_array_elements(m.ranges) as e(obj)
WHERE m.number between (e.obj->>'min')::integer and (e.obj->>'max')::integer;
This works, here's how the query plan looks like:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2662.00 rows=9444 width=68) (actual time=0.049..0.050 rows=1 loops=1)
-> Seq Scan on my_table m (cost=0.00..18.50 rows=850 width=68) (actual time=0.010..0.011 rows=2 loops=1)
-> Function Scan on jsonb_array_elements e (cost=0.00..3.00 rows=11 width=32) (actual time=0.017..0.017 rows=0 loops=2)
Filter: ((m.number >= ((obj ->> 'min'::text))::integer) AND (m.number <= ((obj ->> 'max'::text))::integer))
Rows Removed by Filter: 2
Planning time: 0.101 ms
Execution time: 0.091 ms
The cost estimated is 30% smaller, but the execution is significantly faster. I
never use LATERAL
in my queries, time to look closer at the docs and reconsider.
I'll stick to using LATERAL
while more json functions are under way.
I'll use a yet to be released feature of Postgres, a function called
JSON_EXISTS
. The function takes a column containing json (or jsonb in our
case) as the first argument, and
jsonpath
expression as the second
argument.
Here's how the new query looks like:
SELECT *
FROM my_table
WHERE JSON_EXISTS(ranges, '$[*] ? (@.min < $x && $x <= @.max)' PASSING number AS x);
Note the $[*] ? (@.min < $x && $x <= @.max)
expression - this is a jsonpath
.
Let's see how result EXPLAIN ANALYSE
look like:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on my_table (cost=0.00..18.50 rows=425 width=68) (actual time=0.018..0.018 rows=0 loops=1)
Filter: JSON_EXISTS(ranges, '$.[*]?(@."min" < $"x" && $"x" <= @."max")' PASSING number AS x FALSE ON ERROR)
Rows Removed by Filter: 2
Planning time: 0.020 ms
Execution time: 0.031 ms
(5 rows)
Wow, the numbers are much smaller than previously! It looks as if Postgres didn't have to allocate a lot of memory for unnesting rows. Instead, it performed somewhat complex querying inside a column holding json/jsonb in just one pass, dropping down the query cost significantly.
In order to play with JSON_EXISTS
and jsonpath
I used PostgreSQL 10dev+SQLJSON
branch on sqlfiddle.postgrespro.ru (update: no longer available). I was
previously barely familiar with jsonpath
, but was able to quickly get up to
speed using an online repl, jsonpath.com.
Ability to use jsonpath
in Postgres appears to be far more superior to what is
currently available for querying json objects. What seemed like
hard
to do previously, now becomes possible. From what I can tell, using
JSON_EXISTS
is much faster than using existing workarounds to achive the
same result.
JSON_EXISTS
function is only a tip of the iceberg. There is a lot more
additions to json handling in the newest SQL standard,
SQL:2016 which is a huge step
forward for json adoption in SQL! The standards SQL/JSON section is available to
be downloaded from https://standards.iso.org, here:
c067367_ISO_IEC_TR_19075-6_2017.zip.
Unfortunately neither Postgres 9.6 nor the upcoming Postgres 10 release include implementation of the new standard's json functions. The patch containing a subset of standards json functions was submitted right before merge window closed back in February, but it was too late: the size of patch was too big to be properly reviewed and included in Postgres 10.
According to Oleg Bartunov's slides from PGConf.US in March, the plan is to push SQL/JSON to Postgres 11. Meanwhile the desired json functionality is expected to land in the next major release of Postgres Pro (a fork maintained by the company of the same name).
The slides from Oleg's talk contain tons of info & examples about SQL/JSON, I highly recommend the slides to anyone intereseted in working with json in SQL.
It's warming to see folks behind the SQL standard working on json adoption. And of course, huge THANKS to the patch authors from Postgres Pro team for implementing SQL/JSON standard in Postgres!
I hope that "SQL/JSON support in PostgreSQL" patch, now moved to the next Commitfest, will be reviewed & accepted into the Postgres core.