I'm using 15.0 version of PostgreSQL. I am facing a strange issue where duplicate data is getting inserted in table violating the primary key. Where as not all the query is showing the duplicate records. But after doing one VACUUM (FULL) on the table the duplicate row started showing in all the queries.
I'm seeing this 2nd time here. Previously txndatetime was a part of composite primary key. I thought this may be due to the fact that Postgres might be storing the timestamp as epoch which is a real number in Postgres. We changed it as a long column dt_epoch. Still we are seeing this. BTW the table is a hash partitioned table .
The version:
select version();
| version |
|---|
| PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit |
The table structure:
\d my_table
Partitioned table "my_table"
| Column | Type | Collation | Nullable | Default |
|---|---|---|---|---|
| grp_id | character varying(100) | not null | ||
| txn_id | character varying(100) | not null | ||
| rec_dt | date | not null | ||
| creation_ts | timestamp without time zone | not null | ||
| created_by_user_id | character varying(50) | not null | ||
| crtd_by_client_id | character varying(50) | not null | ||
| txndatetime | timestamp with time zone | not null | ||
| rrn | character varying(150) | |||
| dt_epoch | bigint | not null |
Partition key: HASH (grp_id) Indexes: "my_table_pkey" PRIMARY KEY, btree (grp_id, txn_id, dt_epoch) "my_table_grp_id_idx" btree (grp_id) "my_table_creation_ts_idx" btree (creation_ts) "my_table_txn_id_idx" btree (txn_id) Number of partitions: 8 (Use \d+ to list them.)
The record that showing the duplicate results:
select grp_id, txn_id, txndatetime, dt_epoch, rec_dt
from my_table
where rec_dt = '2024-02-21';
| grp_id | txn_id | txndatetime | dt_epoch | rec_dt |
|---|---|---|---|---|
| XYZ210220241522100001 | 2698e117-1cd1-4705-97cb-8a0fc65907d | 2024-02-21 15:21:13+05:30 | 1708509073000 | 2024-02-21 |
| XYZ210220241522100001 | 07da7715-e67e-4ff2-a000-9f2e999fbd8 | 2024-02-21 15:21:13+05:30 | 1708509073000 | 2024-02-21 |
| XYZ210220241532100001 | 7b68ca16-c955-4526-9a40-124b1b215b2 | 2024-02-21 15:31:39+05:30 | 1708509699000 | 2024-02-21 |
| XYZ210220241532100001 | 7b68ca16-c955-4526-9a40-124b1b215b2 | 2024-02-21 15:31:39+05:30 | 1708509699000 | 2024-02-21 |
| XYZ210220241532100001 | 9cc5e877-27ff-49ee-812c-1b0b48e8418 | 2024-02-21 15:31:39+05:30 | 1708509699000 | 2024-02-21 |
| XYZ210220241246100001 | 7f105788-89f3-4742-a642-43b551ab6e0 | 2024-02-21 12:45:33+05:30 | 1708499733000 | 2024-02-21 |
Whereas after limiting it with rec_dt, grp_id and txn_id it shows only one row:
select * from my_table
where rec_dt = '2024-02-21'
and grp_id='XYZ210220241532100001'
and txn_id='7b68ca16-c955-4526-9a40-124b1b215b2';
| grp_id | txn_id | rec_dt | creation_ts | created_by_user_id | created_by_org_id | txndatetime | rrn | dt_epoch |
|---|---|---|---|---|---|---|---|---|
| XYZ210220241532100001 | 7b68ca16-c955-4526-9a40-124b1b215b2 | 2024-02-21 | 2024-02-21 15:32:31.465 | 1056832970 | NPCI | 2024-02-21 15:31:39+05:30 | 405215100006 | 1708509699000 |
Again with grp_id search it is not showing the said row multiple times:
select grp_id, txn_id, txndatetime, dt_epoch, rec_dt
from my_table
where grp_id='XYZ210220241532100001';
| grp_id | txn_id | txndatetime | dt_epoch | rec_dt |
|---|---|---|---|---|
| XYZ210220241532100001 | 7b68ca16-c955-4526-9a40-124b1b215b2 | 2024-02-21 15:31:39+05:30 | 1708509699000 | 2024-02-21 |
| XYZ210220241532100001 | 9cc5e877-27ff-49ee-812c-1b0b48e8418 | 2024-02-21 15:31:39+05:30 | 1708509699000 | 2024-02-21 |
Now after doing a VACUUM (FULL) on that table all the query started showing the duplicate rows:
DGschema=# VACUUM (FULL) DGschema.my_table;
VACUUM
select grp_id, txn_id, txndatetime, dt_epoch, rec_dt
from my_table
where rec_dt = '2024-02-21';
| grp_id | txn_id | txndatetime | dt_epoch | rec_dt |
|---|---|---|---|---|
| XYZ210220241522100001 | 2698e117-1cd1-4705-97cb-8a0fc65907d | 2024-02-21 15:21:13+05:30 | 1708509073000 | 2024-02-21 |
| XYZ210220241522100001 | 07da7715-e67e-4ff2-a000-9f2e999fbd8 | 2024-02-21 15:21:13+05:30 | 1708509073000 | 2024-02-21 |
| XYZ210220241532100001 | 7b68ca16-c955-4526-9a40-124b1b215b2 | 2024-02-21 15:31:39+05:30 | 1708509699000 | 2024-02-21 |
| XYZ210220241532100001 | 7b68ca16-c955-4526-9a40-124b1b215b2 | 2024-02-21 15:31:39+05:30 | 1708509699000 | 2024-02-21 |
| XYZ210220241532100001 | 9cc5e877-27ff-49ee-812c-1b0b48e8418 | 2024-02-21 15:31:39+05:30 | 1708509699000 | 2024-02-21 |
| ABC210220241246100001 | 7f105788-89f3-4742-a642-43b551ab6e0 | 2024-02-21 12:45:33+05:30 | 1708499733000 | 2024-02-21 |
select grp_id, txn_id, txndatetime, dt_epoch, rec_dt, creation_ts
from my_table
where grp_id='XYZ210220241532100001';
| grp_id | txn_id | txndatetime | dt_epoch | rec_dt | creation_ts |
|---|---|---|---|---|---|
| XYZ210220241532100001 | 7b68ca16-c955-4526-9a40-124b1b215b2 | 2024-02-21 15:31:39+05:30 | 1708509699000 | 2024-02-21 | 2024-02-21 15:32:31.465 |
| XYZ210220241532100001 | 7b68ca16-c955-4526-9a40-124b1b215b2 | 2024-02-21 15:31:39+05:30 | 1708509699000 | 2024-02-21 | 2024-02-21 15:32:36.511 |
| XYZ210220241532100001 | 9cc5e877-27ff-49ee-812c-1b0b48e8418 | 2024-02-21 15:31:39+05:30 | 1708509699000 | 2024-02-21 | 2024-02-21 15:32:50.487 |
This is dangerous bug in Postgres which is defying the basics and making PostgreSQL unreliable. Is it a bug in this PostgreSQL version?