Polars apply same custom function to multiple columns in group by

59 views Asked by At

What's the best way to apply a custom function to multiple columns in Polars? Specifically I need the function to reference another column in the dataframe. Say I have the following:

df = pl.DataFrame({
    'group': [1,1,2,2],
    'other': ['a', 'b', 'a', 'b'],
    'num_obs': [10, 5, 20, 10],
    'x': [1,2,3,4],
    'y': [5,6,7,8],
})

And I want to group by group and calculate an average of x and y, weighted by num_obs. I can do something like this

variables = ['x', 'y']
df.group_by('group').agg((pl.col(var) * pl.col('num_obs')).sum()/pl.col('num_obs').sum() for var in variables)

but I'm wondering if there's a better way. Also, I don't know how to add other aggregations to this approach, but is there a way that I could also add pl.sum('n_obs')? Thanks!

1

There are 1 answers

0
Roman Pekar On BEST ANSWER

You can just pass list of columns into pl.col():

df.group_by('group').agg(
    (pl.col('x','y') * pl.col('num_obs')).sum() / pl.col('num_obs').sum(),
    pl.col('num_obs').sum()
)

┌───────┬──────────┬──────────┬─────────┐
│ group ┆ x        ┆ y        ┆ num_obs │
│ ---   ┆ ---      ┆ ---      ┆ ---     │
│ i64   ┆ f64      ┆ f64      ┆ i64     │
╞═══════╪══════════╪══════════╪═════════╡
│ 1     ┆ 1.333333 ┆ 5.333333 ┆ 15      │
│ 2     ┆ 3.333333 ┆ 7.333333 ┆ 30      │
└───────┴──────────┴──────────┴─────────┘