Optimal plan for PostgreSQL with ORDER BY for two column from same index

46 views Asked by At

I have a table with columns: id, user_id, parent_id, date_m, media_type, name

Also we have indexes:

  • (user_id, parent_id, date_m, media_type, name) - index1
  • (user_id, parent_id, name) - index2

Now the query, that we had at first:

SELECT *
FROM table
WHERE user_id = 2
    AND parent_id = 1
    AND date_m > '2018-09-01T11:41:24'::timestamp
ORDER BY date_m
LIMIT 100

This query used index1 and everything was fast. But we found that this ORDER BY clause doesn't create unique order -> it can skip some rows when there're rows with same date_m.

Tried to use ORDER BY date_m, id - scheduler started using index2 and became slow because now we filter each row with clause date_m > '2018-09-01T11:41:24'::timestamp

Then we tried to use such query

SELECT *
FROM table
WHERE user_id = 2
    AND parent_id = 1
    AND date_m > '2018-09-01T11:41:24'::timestamp
ORDER BY date_m, name
LIMIT 100

It was OK for this task but not the best decision in general, cause for other cases name could be also the same. Nevertheless scheduler again used bad index2 and this time it was strange because in index1 it should be stored in the same order as we define in ORDER BY clause, also all columns from WHERE statement are in this index. You can see it with the very first query

Update: here is DDL

create table if not exists table_name
(
    user_id               bigint                                              not null,
    parent_id             uuid                                                not null,
    id                    bytea                                               not null,
    name                  text                                                not null,
    date_m                timestamp with time zone default now()              not null,
    date_removed          timestamp with time zone,
    media_type            media_type,
    
    constraint uk_table_name_user_id_parent_id
        unique (user_id, parent_id, name)
);

create index if not exists ix_table_name_user_id_parent_id_date_m_media_type_name
    on table_name (user_id, parent_id, date_m, media_type, name);

create type media_type as enum ('image', 'video');

For statement with ORDER BY date_m plan is

Limit  (cost=15.79..30.17 rows=1 width=1680) (actual time=2.450..2.450 rows=0 loops=1)
   ->  Nested Loop  (cost=1.41..15.79 rows=1 width=1680) (actual time=2.449..2.450 rows=1 loops=1)
         ->  Index Scan using ix_table_name_user_id_parent_id_date_m_media_type_name on files f  (cost=0.83..2.85 rows=1 width=737) (actual time=0.867..0.868 rows=1 loops=1)
               Index Cond: ((user_id = <some_uuid>) AND (parent_id = <some_uuid>::uuid) AND (date_m >= <some_date>::timestamp without time zone))
 Planning time: 0.310 ms
 Execution time: 2.516 ms

And for statement ORDER BY date_m, name plan is:

Limit  (cost=15.78..26.14 rows=1 width=1680) (actual time=2458.236..2458.236 rows=0 loops=1)
   ->  Result  (cost=5.42..15.78 rows=1 width=1680) (actual time=2458.235..2458.236 rows=1 loops=1)
         ->  Sort  (cost=5.42..5.43 rows=1 width=1672) (actual time=2457.913..2457.913 rows=1 loops=1)
               Sort Key: date_m, name
               Sort Method: quicksort  Memory: 26kB
               ->  Nested Loop  (cost=1.28..5.41 rows=1 width=1672) (actual time=2408.711..2457.906 rows=1 loops=1)
                     ->  Index Scan using uk_table_name_user_id_parent_id on files f  (cost=0.70..2.73 rows=1 width=737) (actual time=2408.666..2457.861 rows=1 loops=1)
                           Index Cond: ((user_id = <some_uuid>) AND (parent_id = <some_uuid>::uuid))
                           Filter: (date_m >= <some_date>::timestamp without time zone)
                           Rows Removed by Filter: 23179
 Planning time: 0.298 ms
 Execution time: 2458.311 ms
0

There are 0 answers