Collectiong Anti-Join Results via Window Function?

40 views Asked by At

I have a table:

CREATE TABLE t_table (
    c_id int4 NOT NULL,
    c_date_called int4 NOT NULL,
    CONSTRAINT t_table_un UNIQUE (c_id, c_date_called)
);

that stores distinct snapshots of data, with data as such:

INSERT INTO t_table (c_id, c_date_called) 
   VALUES
(1,9),
(2,9),
(3,9),
(4,9),
(5,9),
(1,12),
(2,12),
(3,12),
(5,12),
(1,17),
(3,17)
;

Against this table I can run an anti-join, comparing the current snapshot and one previous snapshot:

--EXPLAIN ANALYSE VERBOSE
SELECT prev.*
    FROM t_table AS prev
    LEFT JOIN t_table AS cur ON (prev.c_id = cur.c_id) AND cur.c_date_called = 17
    WHERE prev.c_date_called = 9
        AND cur.c_id IS NULL
        ;

returns the data I expect when finding the IDs not present in the current c_date_called:

c_id|c_date_called|
----+-------------+
   2|            9|
   4|            9|

But how do I apply the anti-join across multiple distinct c_date_called collecting the results and merging them as compared against the current c_date_called?

Well. Window functions with anti-joins.. yeah need help.

2

There are 2 answers

1
Marmite Bomber On BEST ANSWER

Well to report the complete state of the iss in the snapshots, i.e. a snapshot with a new id not present in the previous snapshot resp. the id removed, i.e. not present in the next snapshot, you do not need an anti-join. Which you do not use in your example anyway.

First define the sequence of the snapshots using integers and flag the current snapshot

with snap as (
select distinct c_date_called from t_table),
snap2 as (
select  
  c_date_called,
  row_number() over(order by c_date_called) as snap_idx,
  c_date_called = max(c_date_called) over() is_current
from snap)
select * from snap2;

c_date_called|snap_idx|is_current|
-------------+--------+----------+
            9|       1|false     |
           12|       2|false     |
           17|       3|true      |

Than join this supporting information to your main table and add two attribuites based on the lag and lead window function of the snapshot index that identify if the previous / next snapshot with the given id is consecutive or if there is a gap. The logic should be pretty self-explained.

with snap as (
select distinct c_date_called from t_table),
snap2 as (
select  
  c_date_called,
  row_number() over(order by c_date_called) as snap_idx,
  c_date_called = max(c_date_called) over() is_current
from snap)
select 
 a.c_id, a.c_date_called, b.snap_idx, b.is_current,
 snap_idx != (1 + lag(snap_idx,1,0) over(partition by a.c_id order by snap_idx)) is_new,
 not (is_current) and snap_idx + 1 !=lead(snap_idx,1,0) over(partition by a.c_id order by snap_idx) is_removed
from t_table a
join snap2 b on a.c_date_called = b.c_date_called
order by 1,3;

c_id|c_date_called|snap_idx|is_current|is_new|is_removed|
----+-------------+--------+----------+------+----------+
   1|            9|       1|false     |false |false     |
   1|           12|       2|false     |false |false     |
   1|           17|       3|true      |false |false     |
   2|            9|       1|false     |false |false     |
   2|           12|       2|false     |false |true      |
   3|            9|       1|false     |false |false     |
   3|           12|       2|false     |false |false     |
   3|           17|       3|true      |false |false     |
   4|            9|       1|false     |false |true      |
   5|            9|       1|false     |false |false     |
   5|           12|       2|false     |false |true      |
   6|           12|       2|false     |true  |false     |
   6|           17|       3|true      |false |false     |

Note that I added id 6 that was introduced in the second snapshot to demonstrated this use case.

3
Bert-Jan Stroop On

I read this problem as:

I have a table t_table(c_id, c_date_called). In this we have sets of data (c_id) per snapshot date (c_date_called). The current snapshot which is the snapshot where c_date_called is highest.

Please give me an overview of all previous data (c_id) which was present in previous snapshots, but not in the current snapshot.

If so, this can be done simply without window functions:

select distinct 
    prev.c_id, 
    prev.c_date_called 
from 
    (select * from t_table where c_date_called not in (
        select max(c_date_called) from t_table)
    ) prev 
left join 
    (select * from t_table where c_date_called in (
        select max(c_date_called) from t_table)
    ) cur 
on 
    cur.c_id = prev.c_id 
where 
    cur.c_id isnull

Which gives the result:

enter image description here