I am reading a csv file and need to normalize the column names as part of a larger function chaining operation. I want to do everything with function chaining.
When using the recommended name.map function for replacing chars in columns like:
import polars as pl
df = pl.DataFrame(
{"A (%)": [1, 2, 3], "B": [4, 5, 6], "C (Euro)": ["abc", "def", "ghi"]}
).with_columns(
pl.all().name.map(
lambda c: c.replace(" ", "_")
.replace("(%)", "pct")
.replace("(Euro)", "euro")
.lower()
)
)
df.head()
I get
shape: (3, 6)
┌───────┬─────┬──────────┬───────┬─────┬────────┐
│ A (%) ┆ B ┆ C (Euro) ┆ a_pct ┆ b ┆ c_euro │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ str │
╞═══════╪═════|══════════╡═══════╡═════╡════════╡
│ 1 ┆ 4 ┆ "abc" ┆ 1 ┆ 4 ┆ "abc" │
│ 2 ┆ 5 ┆ "def" ┆ 2 ┆ 5 ┆ "def" │
│ 3 ┆ 6 ┆ "ghi" ┆ 3 ┆ 6 ┆"ghi" │
└───────┴─────┴──────────┴───────┴─────┴────────┘
instead of the expected
shape: (3, 3)
┌───────┬─────┬────────┐
│ a_pct ┆ b ┆ c_euro │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═══════╪═════|════════╡
│ 1 ┆ 4 ┆ "abc" │
│ 2 ┆ 5 ┆ "def" │
│ 3 ┆ 6 ┆ "ghi" │
└───────┴─────┴────────┘
?
How can I replace specific chars in existing column names with function chaining without creating new columns?
You could simply replace
DataFrame.with_columns()withDataFrame.select()method:IT would be important to say (as Dean MacGregor mentioned in the comments), that
DataFrame.with_columns()always adds columns to the dataframe. The column names might be the same as the ones in the original dataframe, but in that case original columns will be replaced with the new ones. You can see it in the documentation:DataFrame.select(), on the other hand, selects existing columns of the dataframe.Additionally, if you just want to rename all the columns, it's probably more natural to use
DataFrame.rename()instead: