PostgreSQL - jsonb index is not being applied

58 views Asked by At

I have a table called users that contains over 2 millions records with the following structure:

create table users { 
   id integer not null,
   langs jsonb
}

The column langs contains the following data in rows:

    {"default_lang": "en", "custom_langs": ["en", "de"]}
    {"default_lang": "fr", "custom_langs": ["en", "de"]}
    {"default_lang": "pt", "custom_langs": ["en", "de", "fr"]}

The idea is to select only rows whose default_lang is not contained in custom_langs. So, I am able to achieve it like so:

SELECT id, langs
FROM  users
WHERE deleted=false and (langs->'default_lang' <@ (langs->'custom_langs') = false)

I have also created the following index:

    CREATE INDEX IF NOT EXISTS users_langs ON users 
    USING btree ((langs->'default_lang'), (langs->'custom_langs'))
    where langs->'default_lang' <@ (langs->'custom_langs') = false;

But the index users_langs is not being applied. Any ideas on how to fix this. thanks in advance.

Here the query plan result:

 HashAggregate  (cost=957627.32..1002810.29 rows=2536588 width=8) (actual time=17095.651..17096.210 rows=2233 loops=1)
   Output: id
   Group Key: users.id
   Planned Partitions: 64  Batches: 1  Memory Usage: 913kB
   Buffers: shared hit=47315 read=762287
   I/O Timings: read=11079.509
   ->  Seq Scan on public.users  (cost=0.00..821285.71 rows=2536588 width=8) (actual time=17.969..17086.833 rows=2233 loops=1)
         Output: id
         Filter: ((NOT users.deleted) AND (NOT ((users.langs -> 'default_langs'::text) <@ (users.langs -> 'custom_langs'::text))))
         Rows Removed by Filter: 2774667
         Buffers: shared hit=47315 read=762287
         I/O Timings: read=11079.509
 Settings: effective_cache_size = '8041608kB', jit = 'off'
 Query Identifier: -9027923201169468774
 Planning:
   Buffers: shared hit=427 read=32
   I/O Timings: read=4.276
 Planning Time: 5.737 ms
 Execution Time: 17096.749 ms

1

There are 1 answers

0
jjanes On

The partial index is usable, and will be used if you disable seq scans (set enable_seqscan=off), but it won't be used normally just because it is believed to be slower than the full table scan would, because the row estimates are so wrong.

Unfortunately, I can't see how to get the estimates correct, not even by using custom statistics on the expression. I don't know why the custom statistics don't work. You could use the 3rd party extension pg_hint_plan to force an index scan.

After playing with this a bit more, the problem is the = false. The partial index code knows how to deal with that, but apparently the custom statistics code does not. If you create the stats and analyze the table:

create statistics lsdkfj on (langs->'default_lang' <@ (langs->'custom_langs')) 
from users;
analyze users;

then the stats will be right but only if the query is written with is false not = false. Then an index will be used (but only if it too is changed to is false).