Compound index with date function doesn't allow index-only scans?

Question

Compound index with date function doesn't allow index-only scans?

81 views Asked by Brad Pardee At 22 November 2023 at 14:26

I'm trying to use a compound (multicolumn) index on a table to assist in creating daily report counts. I am using Postgres 13, and my table looks like this:

CREATE TABLE inquiries (
    id bigint NOT NULL,
    identity_id bigint NOT NULL,
    received_at timestamp(0) without time zone NOT NULL,
    purpose_id bigint NOT NULL,
    location_id bigint NOT NULL
);

CREATE INDEX "inquiries_DATE_index" ON inquiries USING btree
   (date(received_at), location_id, purpose_id, identity_id);

My query looks like this:

SELECT DATE(received_at), location_id, purpose_id, COUNT(DISTINCT identity_id)
FROM inquiries
WHERE (DATE(received_at) >= $1)
  AND (DATE(received_at) <= $2)
GROUP BY 1, 2, 3

Explain output looks like this:

GroupAggregate  (cost=43703.28..45785.49 rows=10950 width=19)
  Group Key: (date(received_at)), location_id, purpose_id
  ->  Sort  (cost=43703.28..44092.34 rows=155627 width=16)
        Sort Key: (date(received_at)), location_id, purpose_id
        ->  Bitmap Heap Scan on inquiries  (cost=5243.60..27622.21 rows=155627 width=16)
              Recheck Cond: ((date(received_at) >= '2023-11-01'::date) AND (date(received_at) <= '2023-11-30'::date))
              ->  Bitmap Index Scan on "inquiries_DATE_index"  (cost=0.00..5204.70 rows=155627 width=0)
                    Index Cond: ((date(received_at) >= '2023-11-01'::date) AND (date(received_at) <= '2023-11-30'::date))

The index doesn't seem to help, the query takes a long time to execute. If I add a date column to the table and use that instead of date(received_at) then the query works better and the query plan changes to:

GroupAggregate  (cost=0.43..85199.58 rows=10980 width=19)
  Group Key: pacific_date, location_id, purpose_id
  ->  Index Only Scan using inquiries_pacific_date_index on inquiries  (cost=0.43..77813.12 rows=727666 width=16)
        Index Cond: ((pacific_date >= '2023-11-01'::date) AND (pacific_date <= '2023-11-30'::date))

I guess I can do this if I can't find a better way but it seems kind of redundant. Is there a way I can write my original query so it makes better use of the index?

Original Q&A

There are 2 answers

Laurenz Albe On 22 November 2023 at 14:59

The problem is that due to a limitation in PostgreSQL, it won't use an index-only scan with your index, because the index contains an expression. You'd have to add received_at to the index (in addition to date(received_at)) to make it work.

**Erwin Brandstetter** · Accepted Answer · 2023-11-22T15:42:13+00:00

Drop-in fix

Like Laurenz explained, index-only scans currently (pg 16) suffer from a corner-case limitation in Postgres. The manual:

However, PostgreSQL's planner is currently not very smart about such cases. It considers a query to be potentially executable by index-only scan only when all columns needed by the query are available from the index.

The manual has more details. One workaround is to "include" the column itself in the index (replacing the old one):

CREATE INDEX inquiries_date_plus_idx ON inquiries
   (date(received_at), location_id, purpose_id, identity_id) INCLUDE (received_at);

Allows index-only scans for your original query. But it also increases the size of the index - by 8 bytes per row in your case.

fiddle

Better

Create an index on bare columns, without expressions:

CREATE INDEX inquiries_received_at_plus_idx ON inquiries
   (received_at, location_id, purpose_id, identity_id);

(A plain index also often has additional utility for other purposes.)
And adjust your query slightly, to be exactly equivalent:

SELECT received_at::date, location_id, purpose_id, COUNT(DISTINCT identity_id)
FROM   inquiries
WHERE  received_at >= $1
AND    received_at <  $2 + 1  -- !
GROUP  BY 1, 2, 3;

Input $1 and $2 must be type date, and received_at timestamp, as indicated in the question.

count(DISTINCT col) is typically slow in my experience. This may be faster, yet:

SELECT received_at::date, location_id, purpose_id, count(*) AS dist_identities
FROM  (
   SELECT DISTINCT ON (1,2,3,4)
          received_at::date, location_id, purpose_id, identity_id
   FROM   inquiries
   WHERE  received_at >= $1
   AND    received_at <  $2 + 1
   ) sub
GROUP  BY 1, 2, 3;

fiddle

And if there are many duplicates per (received_at::date, location_id, purpose_id, identity_id), an emulated index-skip scan may be much faster, yet. See:

Upgrade

Postgres has improved performance for big data with every major version over the past years. Consider upgrading to the latest version Postgres 16 (at the time of writing). Should give you an immediate, additional boost.

TechQA.

Compound index with date function doesn't allow index-only scans?

There are 2 answers

Drop-in fix

Better

Upgrade

Related Questions in SQL

Related Questions in POSTGRESQL

Related Questions in INDEXING

Related Questions in QUERY-OPTIMIZATION

Related Questions in POSTGRESQL-PERFORMANCE

Popular Questions

Trending Questions