I'm working on an anomaly detection model for CPU data and my current dataframes are structured like this (with 10k+ servers and 168 hours):
| Server | Hour1 | Hour2 | Hour3 |
|---|---|---|---|
| server1 | val1.1 | val1.2 | val1.3 |
| server2 | val2.1 | val 2.2 | val2.3 |
I need it to be in a structure like this:
| Server | time | Value |
|---|---|---|
| server1 | 0 | value0 |
| server1 | 1 | value1 |
| server2 | 0 | value0 |
| server2 | 1 | value1 |
The problem is there are 10k+ servers and 168 hourly values for each, so it's taking an eternity to iterate over. Is there a more efficient way to do this transformation?
My current attempt is creating a new df with nested for loops like so:
for index, row in df.iterrows():
for n in range(0,167):
new_df.loc[len(new_df.index)] = row
new_df.iat[len(new_df.index)-1, 2] = n
for index, row in new_df.iterrows():
for i, r in df.iterrows():
new_df_ts = row[2]
if(row[0] == r[0]):
new_df.iat[index, 3] = df.iat[i, 2 + new_df_ts]
Use
wide_to_longwhich results in
Alternatively, use
meltand then strip theHourtext out of theTimecolumn.Then do whatever additional processing is required on the
TimeandValuecolumns, such as subtracting 1 from the time period. Usedf.replaceordf["Value"].str.replaceto alter values if needed.