What's the best way to apply a custom function to multiple columns in Polars? Specifically I need the function to reference another column in the dataframe. Say I have the following:
df = pl.DataFrame({
'group': [1,1,2,2],
'other': ['a', 'b', 'a', 'b'],
'num_obs': [10, 5, 20, 10],
'x': [1,2,3,4],
'y': [5,6,7,8],
})
And I want to group by group and calculate an average of x and y, weighted by num_obs. I can do something like this
variables = ['x', 'y']
df.group_by('group').agg((pl.col(var) * pl.col('num_obs')).sum()/pl.col('num_obs').sum() for var in variables)
but I'm wondering if there's a better way. Also, I don't know how to add other aggregations to this approach, but is there a way that I could also add pl.sum('n_obs')? Thanks!
You can just pass list of columns into
pl.col():