Reconciling two streaming delta live tables

41 views Asked by At

I have two delta live tables (events and sales). These are streaming tables having a common field (sale_id). I need to reconcile these tables to ensure that each sale_id in the events table has a match in the sales table or not. All the records from the events table that has no match in the sales table needs to be routed to an error table. The error table will then be used the next day in the reconciliation process.

So the process would look like.

select *, CASE 
WHEN sales.sale_num IS NOT NULL  OR error.sale_num IS NOT NULL THEN 1
ELSE 0
END AS zero_or_one from events
left outer join sales
on events.sale_id = sales.sale_id
left outer join error
on event.sale_id = error.sale_id

But there are two issues I am facing

  1. Left outer join is not possible between two stream sources unless a watermark column is provided. The datasets do not have proper watermark columns. I am planning to overcome this issue by doing a series of inner joins.
  2. The error table does not exist initially and this needs to be created as part of the pipeline. Can the error table be created initially in the DLT pipeline and then updated with the new error records identified as part of the reconciliation process. When I tried to do this, I got an error saying that the error table cannot be defined twice in the pipeline.

Could anyone please help me with suggestions on the issues I am facing.

0

There are 0 answers