Group by rolling 30 minute timeinterval using python

218 views Asked by At

I am quite stuck.

I have a set of data looking like this: enter image description here

I have a date and 0 is the number of baggage for that specific time interval.

I would like to create a 30 minute forward looking rolling time window, and for each time period I need the flight name within this time window along with the number of baggage.

I want the output to be in a dictionary or data frame where keys is the time period.

I have the following code where I tried some different things. But I can't get the result I want. And the following code is not forward looking.

# rolling sum
df_temp = df_conflict[['sta+time_to_infeed_left',0]].rolling('1800s', on='sta+time_to_infeed_left',min_periods=0).sum().reset_index()
df_temp.sort_values('sta+time_to_infeed_left').head(45)

Is this something you can help with? :-)

Example of data

    ArrivalFlightName   sta sta+time_to_infeed_left sta+time_to_infeed_right    idx 0
18  Flightname402   2019-12-09 20:00:00 2019-12-09 20:01:05.663474700   2019-12-09 20:03:55.006119960   21  1
24  Flightname411   2019-12-10 08:20:00 2019-12-10 08:22:28.606811160   2019-12-10 08:25:00.669594660   27  8
11  Flightname376   2019-12-10 09:25:00 2019-12-10 09:27:26.878824960   2019-12-10 09:30:16.221470220   14  1
19  Flightname404   2019-12-10 10:10:00 2019-12-10 10:12:00.095039220   2019-12-10 10:14:49.437684480   22  1
5   Flightname362   2019-12-10 10:10:00 2019-12-10 10:12:36.382748940   2019-12-10 10:15:25.725394200   8   3
2   Flightname349   2019-12-10 10:15:00 2019-12-10 10:17:05.278997760   2019-12-10 10:19:54.621643020   3   4
8   Flightname368   2019-12-10 10:30:00 2019-12-10 10:32:19.966880280   2019-12-10 10:35:09.309525540   11  1
23  Flightname410   2019-12-10 11:20:00 2019-12-10 11:22:26.878824960   2019-12-10 11:25:16.221470220   26  3
17  Flightname401   2019-12-10 11:55:00 2019-12-10 11:56:05.663474700   2019-12-10 11:58:55.006119960   20  19
15  Flightname391   2019-12-10 12:20:00 2019-12-10 12:21:21.215350260   2019-12-10 12:21:36.767225880   18  10
4   Flightname359   2019-12-10 12:40:00 2019-12-10 12:42:26.878824960   2019-12-10 12:45:16.221470220   7   2
7   Flightname364   2019-12-10 13:00:00 2019-12-10 13:02:36.382748940   2019-12-10 13:05:25.725394200   10  

Example of output dictionary. Where the the flightindex's that are in several periods are flights that are in the rolling time window.

{'period_1': {'Flightname idx': [177, 181], 'Baggage': [103, 87]},
 'period_2': {'Flightname  idx': [177, 102, 115, 164],
  'Baggage': [103, 59, 16, 8]},
 'period_3': {'Flightname  idx': [164, 173, 174], 'Baggage': [27, 50, 8]},
 'period_4': {'Flightname idx': [70, 77, 118], 'Baggage': [19, 3, 24]},
 'period_5': {'Flightname idx': [70, 98], 'Baggage': [19, 4]},
1

There are 1 answers

2
tturbo On

I try to give you some hints. However my code is not tested, as i don't have your entire input to test (and also no time to do so ;).

For that you want window with length 30min and step size 10min, you must set these values in the rolling function. Also, your interested not only in row 0 (not a very clear name), but also in the 'ArrivalFlightName'

# set datetime index and selecting rows you are interested in
df_temp = df_conflict.set_index('sta+time_to_infeed_left')['ArrivalFlightName', 0]

# convert in rolling window of 30min and step size 10min
df_temp.rolling('30m', step='10m') # not yet aggregated...

# next, you want to aggregate the values into lists
df_temp = df_temp.agg({'ArrivalFlightName': lambda x: list(x), {0: lambda x: list(x)})

I hope this helps... More details on aggregating into list