I have a list of form [[[key1,value1],[key2,value2],...,500],...,n] - aka L2 orderbook data with 500 levels. I want to translate this list into a dataframe, containing columns key_1,...,key_500 and value_1,...,value_500. - A dataframe that has a price + volume column for each level (e.g. level_1_volume, level_2_volume, ..., level_500_volume)
My current solution is simply looping through the existing dataframe (which simply has this list as a column) with df.iterrows(), extracting the values into separate lists, then creating dataframes using these lists and merging these as columns. Doesn't feel terribly efficient compared to other operations on my data set, is there some way to do this with built-in methods?
List1 = [[key1.1,val1.1],[key2.1,val2.1],...]
List2 = [[key1.2,val1.2],[key2.2,val2.2],...]
key1_column, val1_column, key2_column, val2_column
Row_List1 key1.1 val1.1 key2.1 val2.1
Row_List2 key1.2 val1.2 key2.2 val2.2
Current Solution ["bids"] and ["asks"] simply contain dictionary/Json object of form {key1:value1; key2:value2} for 500 pairs.
# Initialize empty lists for prices and volumes
prices_bids = [[] for _ in range(500)]
volumes_bids = [[] for _ in range(500)]
prices_asks = [[] for _ in range(500)]
volumes_asks = [[] for _ in range(500)]
# Iterate through each row
for index, row in df.iterrows():
attributes = row["bids"]
result_bids = [[key,attributes[key]] for key in sorted(attributes.keys(), reverse=True)]
attributes = row["asks"]
result_asks = [[key,attributes[key]] for key in sorted(attributes.keys(), reverse=False)]
for i in range(500):
prices_bids[i].append(np.float64(result_bids[i][0]))
volumes_bids[i].append(np.float64(result_bids[i][1]))
prices_asks[i].append(np.float64(result_asks[i][0]))
volumes_asks[i].append(np.float64(result_asks[i][1]))
# Create DataFrame from lists
for i in range(500):
# Expressing prices as spreads
df[f"bid_level_{i}_price"] = pd.Series((df["mid_price"]/pd.Series(prices_bids[i],dtype='float64')-1)*10000,dtype="float64")
df[f"bid_level_{i}_volume"] = pd.Series(volumes_bids[i],dtype='float64')
df[f"ask_level_{i}_price"] = pd.Series((df["mid_price"]/pd.Series(prices_asks[i],dtype='float64')-1)*10000,dtype="float64")
df[f"ask_level_{i}_volume"] = pd.Series(volumes_asks[i],dtype='float64')
Try executing the below mentioned code, It should work import pandas as pd