Algorithm to cover time periods

91 views Asked by At

I have a table with data that covers time periods. The periods overlap.

I want to purge the data in order to save space, but without losing days of data.

Here is an example table:

Id | Start      | End
 1 | 2023.01.01 | 2023.01.05
 2 | 2023.01.03 | 2023.01.07
 3 | 2023.01.04 | 2023.01.09
 4 | 2023.01.07 | 2023.01.13
 5 | 2023.01.09 | 2023.01.14
 6 | 2023.01.12 | 2023.01.15

Original Data

In this example, I have several solutions :

 1. Clean 2 and 5
 2. Clean 3 and 5

Data for solution 1 Data for solution 2

The best solution is the second, because 2 and 3 don't have the same weight.

 - Weight of Id=2 : 5 days 
 - Weight of Id=3 : 6 days

So I will gain more space if I took the second solution.

I have millions of rows, so I need an efficient algorithm.

Note that sometimes I have intervals with no data.

Do you have a name of algorithm to solve this problem please ?

1

There are 1 answers

0
MBo On

If we separate all N time intervals like [1..2],[3..3],[4..5]..[12..13],[14..14],[15..15] for the first picture, we can build a net (kind of directed graph with source and sink), where edges from source to intervals above have capacity 1, edges from intervals to time periods (covering intervals) have capacity 1, edges from time periods to sink have large enough (N) capacity, and cost of the last edges correspond to period weight.

So Minimum-cost flow problem providing amount of flow N