Problem Description
I have a table (#tmstmp) with 2 columns dt (DATETIME) and payload (INT). Eventually I want to sum payload for each 5 minute interval there is.
Code
Setup
DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DROP TABLE IF EXISTS #tmstmp
, #numbers;
CREATE TABLE #tmstmp (
dt DATETIME PRIMARY KEY
, payload INT NOT NULL
);
CREATE TABLE #numbers (
n INT PRIMARY KEY
);
WITH numbers (n) AS (
SELECT 0 AS n
UNION ALL
SELECT n + 1 AS n
FROM numbers
WHERE n < 100
)
INSERT
INTO #numbers
SELECT n
FROM numbers;
WITH rnd (mins, secs) AS (
SELECT n2.n AS mins
, CAST(ABS(CHECKSUM(NEWID())) % 60 AS INT) AS mins
FROM #numbers AS n1
, #numbers as n2
WHERE n1.n < 5
AND n2.n < 15
), tmstmp (dt) AS (
SELECT DATEADD(SECOND, secs, DATEADD(MINUTE, mins, @start)) AS dt
FROM rnd
)
INSERT
INTO #tmstmp
SELECT DISTINCT dt
, -1 AS payload
FROM tmstmp
ORDER BY dt;
UPDATE #tmstmp
SET payload = CAST(ABS(CHECKSUM(NEWID())) % 10 AS INT);
GO
Non overlapping timeslots are easy
DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DECLARE @slotDuration INT = 5;
WITH agg (slot, sum_payload) AS (
SELECT DATEDIFF(MINUTE, @start, dt) / @slotDuration AS slot
, SUM(payload) AS sum_payload
FROM #tmstmp
GROUP BY DATEDIFF(MINUTE, @start, dt) / @slotDuration
)
SELECT DATEADD(MINUTE, slot * @slotDuration, @start) AS [from]
, DATEADD(MINUTE, (slot + 1) * @slotDuration, @start) AS [to]
, sum_payload
FROM agg;
| from | to | sum_payload |
|---|---|---|
| 2024-01-01 12:00:00 | 2024-01-01 12:05:00 | 124 |
| 2024-01-01 12:05:00 | 2024-01-01 12:10:00 | 106 |
| 2024-01-01 12:10:00 | 2024-01-01 12:15:00 | 95 |
Ultimate Goal: get running timeslots
I want, however, to have an entry for each interval in the range, that is from 12:00-12:05, 12:01-12:06, 12:02-12:07 etc. until the last timeslot.
I can construct the limits in the whole range before and use that in a JOIN like this:
DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DECLARE @slotDuration INT = 5;
DECLARE @intervals INT = (SELECT DATEDIFF(MINUTE, @start, MAX(dt)) FROM #tmstmp);
WITH ranges ([from], [to], slot) AS (
SELECT DATEADD(MINUTE, n, @start) AS [from]
, DATEADD(MINUTE, n + @slotDuration, @start) AS [to]
, n AS slot
FROM #numbers
WHERE n <= @intervals
), tm_mult (slot, [from], [to], dt, payload) AS (
SELECT slot
, [from]
, [to]
, dt
, payload
FROM #tmstmp
INNER JOIN ranges
ON [from] <= dt
AND dt < [to]
)
SELECT MIN([from]) AS [from]
, MAX([to]) AS [to]
, SUM(payload) AS sum_payload
FROM tm_mult
GROUP BY slot
ORDER BY slot;
| from | to | sum_payload |
|---|---|---|
| 2024-01-01 12:00:00 | 2024-01-01 12:05:00 | 124 |
| 2024-01-01 12:01:00 | 2024-01-01 12:06:00 | 120 |
| 2024-01-01 12:02:00 | 2024-01-01 12:07:00 | 125 |
| ... | ... | ... |
| 2024-01-01 12:14:00 | 2024-01-01 12:19:00 | 19 |
While this works in this toy example I have hundreds of thousands of timestamps in my real data and worst of all I little impact on the indices. My gut feeling tells me that I will create quite some duplication with my inequality JOIN and I was wondering whether this is anyways the canonical way of doing it or whether there is a more SQL-onic way of doing it? (like the pythonistas like to call certain code pythonic if it uses the language inherent concepts rather than trying to solve it with general tools).
Window functions in sql (WINDOW - microsoft.com / OVER - microsoft.com) are a great asset to add to your SQL toolbelt. Also particularly canonical; windows have been in since SQL Server 2005.
Below is an example:
I'd like to draw attention to both the
4 PRECEDINGandDATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0). As the later practically floors the datetime to the minute,2024-01-01 12:04:00.000is inclusive up to2024-01-01 12:04:59.999, but doesn't include2024-01-01 12:05:00.000. Hopefully that's the functionality you are looking for.Here is a fiddle