I'm trying to use a compound (multicolumn) index on a table to assist in creating daily report counts. I am using Postgres 13, and my table looks like this:
CREATE TABLE inquiries (
id bigint NOT NULL,
identity_id bigint NOT NULL,
received_at timestamp(0) without time zone NOT NULL,
purpose_id bigint NOT NULL,
location_id bigint NOT NULL
);
CREATE INDEX "inquiries_DATE_index" ON inquiries USING btree
(date(received_at), location_id, purpose_id, identity_id);
My query looks like this:
SELECT DATE(received_at), location_id, purpose_id, COUNT(DISTINCT identity_id)
FROM inquiries
WHERE (DATE(received_at) >= $1)
AND (DATE(received_at) <= $2)
GROUP BY 1, 2, 3
Explain output looks like this:
GroupAggregate (cost=43703.28..45785.49 rows=10950 width=19)
Group Key: (date(received_at)), location_id, purpose_id
-> Sort (cost=43703.28..44092.34 rows=155627 width=16)
Sort Key: (date(received_at)), location_id, purpose_id
-> Bitmap Heap Scan on inquiries (cost=5243.60..27622.21 rows=155627 width=16)
Recheck Cond: ((date(received_at) >= '2023-11-01'::date) AND (date(received_at) <= '2023-11-30'::date))
-> Bitmap Index Scan on "inquiries_DATE_index" (cost=0.00..5204.70 rows=155627 width=0)
Index Cond: ((date(received_at) >= '2023-11-01'::date) AND (date(received_at) <= '2023-11-30'::date))
The index doesn't seem to help, the query takes a long time to execute. If I add a date column to the table and use that instead of date(received_at) then the query works better and the query plan changes to:
GroupAggregate (cost=0.43..85199.58 rows=10980 width=19)
Group Key: pacific_date, location_id, purpose_id
-> Index Only Scan using inquiries_pacific_date_index on inquiries (cost=0.43..77813.12 rows=727666 width=16)
Index Cond: ((pacific_date >= '2023-11-01'::date) AND (pacific_date <= '2023-11-30'::date))
I guess I can do this if I can't find a better way but it seems kind of redundant. Is there a way I can write my original query so it makes better use of the index?
Drop-in fix
Like Laurenz explained, index-only scans currently (pg 16) suffer from a corner-case limitation in Postgres. The manual:
The manual has more details. One workaround is to "include" the column itself in the index (replacing the old one):
Allows index-only scans for your original query. But it also increases the size of the index - by 8 bytes per row in your case.
fiddle
Better
Create an index on bare columns, without expressions:
(A plain index also often has additional utility for other purposes.)
And adjust your query slightly, to be exactly equivalent:
Input
$1and$2must be typedate, andreceived_at timestamp, as indicated in the question.count(DISTINCT col)is typically slow in my experience. This may be faster, yet:fiddle
And if there are many duplicates per
(received_at::date, location_id, purpose_id, identity_id), an emulated index-skip scan may be much faster, yet. See:Upgrade
Postgres has improved performance for big data with every major version over the past years. Consider upgrading to the latest version Postgres 16 (at the time of writing). Should give you an immediate, additional boost.