I have a dataframe:
| sensor_id | timestamp | current | apparent_power |
|---|---|---|---|
| 0 | 2024-02-01 00:00:00.944369920 | 1.550 | 101.5 |
| 1 | 2024-02-01 00:00:00.959425024 | 0.284 | 20.4 |
| 0 | 2024-02-01 00:00:01.945563136 | 1.549 | 101.6 |
| 1 | 2024-02-01 00:00:01.950393856 | 0.286 | 20.2 |
| 0 | 2024-02-01 00:00:02.944880896 | 1.547 | 100.6 |
| 1 | 2024-02-01 00:00:02.949997056 | 0.290 | 21.6 |
| 0 | 2024-02-01 00:00:03.944605184 | 1.547 | 100.9 |
| 1 | 2024-02-01 00:00:03.949342976 | 0.290 | 21.8 |
| 0 | 2024-02-01 00:00:04.944967936 | 1.548 | 100.9 |
| 1 | 2024-02-01 00:00:04.950715136 | 0.285 | 20.5 |
| 0 | 2024-02-01 00:00:05.944571904 | 1.547 | 100.7 |
| 1 | 2024-02-01 00:00:05.952841984 | 0.284 | 20.2 |
| 0 | 2024-02-01 00:00:06.945287936 | 1.548 | 100.8 |
| 1 | 2024-02-01 00:00:06.950289920 | 0.287 | 20.8 |
| 0 | 2024-02-01 00:00:07.944841984 | 1.547 | 100.7 |
| 1 | 2024-02-01 00:00:07.962969088 | 0.290 | 21.8 |
| 0 | 2024-02-01 00:00:08.945434112 | 1.550 | 103.1 |
| 1 | 2024-02-01 00:00:08.950361088 | 0.281 | 20.1 |
| 0 | 2024-02-01 00:00:09.944396032 | 1.551 | 103.8 |
| 1 | 2024-02-01 00:00:09.949807872 | 0.285 | 21.1 |
My end aim to so push sensor 1 into sensor 2 line, summing the row of each only to create the two row total:
| timestamp_0 | timestamp_1 | total_current | total_apparent_power |
|---|---|---|---|
| 2024-02-01 00:00:00.944369920 | 2024-02-01 00:00:00.959425024 | 1.834 | 121.9 |
| 2024-02-01 00:00:01.945563136 | 2024-02-01 00:00:01.950393856 | 1.835 | 121.6 |
| 2024-02-01 00:00:02.944880896 | 2024-02-01 00:00:02.949997056 | xxx | xxx |
| 2024-02-01 00:00:03.944605184 | 2024-02-01 00:00:03.949342976 | xxx | xxx |
| 2024-02-01 00:00:04.944967936 | 2024-02-01 00:00:04.950715136 | xxx | xxx |
| 2024-02-01 00:00:05.944571904 | 2024-02-01 00:00:05.952841984 | xxx | xxx |
| 2024-02-01 00:00:06.945287936 | 2024-02-01 00:00:06.950289920 | xxx | xxx |
| 2024-02-01 00:00:07.944841984 | 2024-02-01 00:00:07.962969088 | xxx | xxx |
| 2024-02-01 00:00:08.945434112 | 2024-02-01 00:00:08.950361088 | xxx | xxx |
| 2024-02-01 00:00:09.944396032 | 2024-02-01 00:00:09.949807872 | xxx | xxx |
I have took the timestamps and grouped them by second and hour to produce:
| sensor_id | timestamp | current | apparent_power | seconds | hour |
|---|---|---|---|---|---|
| 0 | 2024-02-01 00:00:00.944369920 | 1.550 | 101.5 | 0 | 0 |
| 1 | 2024-02-01 00:00:00.959425024 | 0.284 | 20.4 | 0 | 0 |
| 0 | 2024-02-01 00:00:01.945563136 | 1.549 | 101.6 | 1 | 0 |
| 1 | 2024-02-01 00:00:01.950393856 | 0.286 | 20.2 | 1 | 0 |
| 0 | 2024-02-01 00:00:02.944880896 | 1.547 | 100.6 | 2 | 0 |
| 1 | 2024-02-01 00:00:02.949997056 | 0.290 | 21.6 | 2 | 0 |
| 0 | 2024-02-01 00:00:03.944605184 | 1.547 | 100.9 | 3 | 0 |
| 1 | 2024-02-01 00:00:03.949342976 | 0.290 | 21.8 | 3 | 0 |
| 0 | 2024-02-01 00:00:04.944967936 | 1.548 | 100.9 | 4 | 0 |
| 1 | 2024-02-01 00:00:04.950715136 | 0.285 | 20.5 | 4 | 0 |
| 0 | 2024-02-01 00:00:05.944571904 | 1.547 | 100.7 | 5 | 0 |
| 1 | 2024-02-01 00:00:05.952841984 | 0.284 | 20.2 | 5 | 0 |
| 0 | 2024-02-01 00:00:06.945287936 | 1.548 | 100.8 | 6 | 0 |
| 1 | 2024-02-01 00:00:06.950289920 | 0.287 | 20.8 | 6 | 0 |
| 0 | 2024-02-01 00:00:07.944841984 | 1.547 | 100.7 | 7 | 0 |
| 1 | 2024-02-01 00:00:07.962969088 | 0.290 | 21.8 | 7 | 0 |
| 0 | 2024-02-01 00:00:08.945434112 | 1.550 | 103.1 | 8 | 0 |
| 1 | 2024-02-01 00:00:08.950361088 | 0.281 | 20.1 | 8 | 0 |
| 0 | 2024-02-01 00:00:09.944396032 | 1.551 | 103.8 | 9 | 0 |
| 1 | 2024-02-01 00:00:09.949807872 | 0.285 | 21.1 | 9 | 0 |
I have something that starts to do it, but falls over if the data has a missing sensor reading:
second_tracker = 0
a_row = 0
b_row = 0
for i, row in test_file_df.iterrows():
if row['seconds'] != second_tracker:
second_tracker += 1
# store totals
a_cur = a_row['current']
b_cur = b_row['current']
total_current = a_cur + b_cur
a_app_power = a_row['apparent_power']
b_app_power = b_row['apparent_power']
total_app_power = a_app_power + b_app_power
new_row = {'timestamp_0': a_row['timestamp'],'timestamp_1': b_row['timestamp'], 'total_current': total_current, 'total_app_power': total_app_power}
print(new_row)
power_sum_df.loc[len(power_sum_df)] = new_row
a_row = 0
b_row = 0
if row['sensor_id'] == 0:
a_row = row
else:
b_row = row
It stops when there is no matching sensor row create due to sensor missing - i.e. row is sensor 0, next row is also sensor 0 as previous sensor 1 was lost.
I have tried splitting into to two DF and tries to sum row 1 + row 1 but could understand how to add the two rows into a new data frame and sum only two columns while copying up the next row timestamp.
Can anyone help - even a better way of pulling out the alternate rows into one row. I tried to use unstack, group and agg to no avail.
Update***** data shows irregular records within the single time second:
| line no | sensor_id | timestamp | current | apparent_power |
|---|---|---|---|---|
| 4 | 0 | 2024-01-25 00:00:02.977937920 | 1.502 | 370.9 |
| 5 | 1 | 2024-01-25 00:00:02.983158016 | 0.293 | 72.4 |
| 6 | 0 | 2024-01-25 00:00:03.977388032 | 1.503 | 371.1 |
| 7 | 1 | 2024-01-25 00:00:03.995464960 | 0.284 | 70.3 |
| 8 | 0 | 2024-01-25 00:00:04.978688000 | 1.500 | 370.6 |
| 9 | 1 | 2024-01-25 00:00:05.025767168 | 0.287 | 70.9 |
| 10 | 0 | 2024-01-25 00:00:05.980681984 | 1.495 | 369.0 |
| 11 | 1 | 2024-01-25 00:00:06.031785984 | 0.286 | 70.7 |
| 12 | 0 | 2024-01-25 00:00:06.977396992 | 1.497 | 369.7 |
| 13 | 1 | 2024-01-25 00:00:06.984870912 | 0.280 | 69.3 |
| 14 | 0 | 2024-01-25 00:00:07.979064832 | 1.495 | 369.1 |
| 15 | 1 | 2024-01-25 00:00:07.983921920 | 0.282 | 69.6 |
Desired Result: There is logic applied where there are more than 2 records in a single second - always taking one of sensor0 and one of sensor 1. If there is only a single sensor reading within the second, then use the previous second corresponding sensor value for the sum.
| lines joined | sensor_0 | sensor_1 | timestamp_0 | timestamp_1 | total_current | total_apparent_power | Comment |
|---|---|---|---|---|---|---|---|
| 4 & 5 | 0 | 1 | 2024-01-25 00:00:02.977937920 | 2024-01-25 00:00:02.983158016 | 1.795 | 443.2 | Normal |
| 6 & 7 | 0 | 1 | 2024-01-25 00:00:03.977388032 | 2024-01-25 00:00:03.995464960 | 1.787 | 441.4 | Normal |
| 8 & 7 | 0 | 1 | 2024-01-25 00:00:04.978688000 | use line 7 sensor 1 as no sensor 1 in this second | 1.784 | 440.9 | use previous sensor 1 |
| 10 & 9 | 0 | 1 | 2024-01-25 00:00:05.980681984 | 2024-01-25 00:00:05.025767168 | 1.782 | 439.9 | swap so ts_0 is sensor_0 |
| 12 & 13 | 0 | 1 | 2024-01-25 00:00:06.977396992 | 2024-01-25 00:00:06.984870912 | 1.777 | 439.0 | use last sens0, and last sens1 of second |
| 14 & 15 | 0 | 1 | 2024-01-25 00:00:07.979064832 | 2024-01-25 00:00:07.983921920 | 1.777 | 438.7 | Normal |
I have also modified the group for hour, min & second to put it in the grouping I am after.
power_agg_df = (test_file_df.groupby([ts.dt.hour, ts.dt.minute, ts.dt.second], sort=False)
.agg(** {
"sensor_0" : ("sensor_id", "first"),
"sensor_1" : ("sensor_id", "last"),
"timestamp_0" : ("timestamp", "first"),
"timestamp_1" : ("timestamp", "last"),
"total_current" : ("current", "sum"),
"total_apparent_power" : ("apparent_power", "sum"),
}))
Instead of a loop, you can use the (
hour/second) as a grouper toaggthe operations you need :