Replace chars in existing column names without creating new columns

Question

Replace chars in existing column names without creating new columns

53 views Asked by Vega At 26 March 2024 at 12:40

I am reading a csv file and need to normalize the column names as part of a larger function chaining operation. I want to do everything with function chaining.

When using the recommended name.map function for replacing chars in columns like:

import polars as pl

df = pl.DataFrame(
    {"A (%)": [1, 2, 3], "B": [4, 5, 6], "C (Euro)": ["abc", "def", "ghi"]}
).with_columns(
    pl.all().name.map(
        lambda c: c.replace(" ", "_")
        .replace("(%)", "pct")
        .replace("(Euro)", "euro")
        .lower()
    )
)
df.head()

I get

shape: (3, 6)
┌───────┬─────┬──────────┬───────┬─────┬────────┐
│ A (%) ┆ B   ┆ C (Euro) ┆ a_pct ┆ b   ┆ c_euro │
│ ---   ┆ --- ┆ ---      ┆ ---   ┆ --- ┆ ---    │
│ i64   ┆ i64 ┆ str      ┆ i64   ┆ i64 ┆ str    │
╞═══════╪═════|══════════╡═══════╡═════╡════════╡
│ 1     ┆ 4   ┆ "abc"    ┆ 1     ┆ 4   ┆ "abc"  │
│ 2     ┆ 5   ┆ "def"    ┆ 2     ┆ 5   ┆ "def"  │
│ 3     ┆ 6   ┆ "ghi"    ┆ 3     ┆ 6   ┆"ghi"   │
└───────┴─────┴──────────┴───────┴─────┴────────┘

instead of the expected

shape: (3, 3)
┌───────┬─────┬────────┐
│ a_pct ┆ b   ┆ c_euro │
│ ---   ┆ --- ┆ ---    │ 
│ i64   ┆ i64 ┆ str    │
╞═══════╪═════|════════╡
│ 1     ┆ 4   ┆ "abc"  │
│ 2     ┆ 5   ┆ "def"  │
│ 3     ┆ 6   ┆ "ghi"  │
└───────┴─────┴────────┘

?

How can I replace specific chars in existing column names with function chaining without creating new columns?

Original Q&A

There are 1 answers

**Roman Pekar** · Accepted Answer · 2024-03-26T12:43:02+00:00

You could simply replace DataFrame.with_columns() with DataFrame.select() method:

df = pl.DataFrame(
    {"A (%)": [1, 2, 3], "B": [4, 5, 6], "C (Euro)": ["abc", "def", "ghi"]}
).select(
    pl.all().name.map(
        lambda c: c.replace(" ", "_")
        .replace("(%)", "pct")
        .replace("(Euro)", "euro")
        .lower()
    )
)

┌───────┬─────┬────────┐
│ a_pct ┆ b   ┆ c_euro │
│ ---   ┆ --- ┆ ---    │
│ i64   ┆ i64 ┆ str    │
╞═══════╪═════╪════════╡
│ 1     ┆ 4   ┆ abc    │
│ 2     ┆ 5   ┆ def    │
│ 3     ┆ 6   ┆ ghi    │
└───────┴─────┴────────┘

IT would be important to say (as Dean MacGregor mentioned in the comments), that DataFrame.with_columns() always adds columns to the dataframe. The column names might be the same as the ones in the original dataframe, but in that case original columns will be replaced with the new ones. You can see it in the documentation:

Add columns to this DataFrame.

Added columns will replace existing columns with the same name.

DataFrame.select(), on the other hand, selects existing columns of the dataframe.

Additionally, if you just want to rename all the columns, it's probably more natural to use DataFrame.rename() instead:

...
.rename(
    lambda c: c.replace(" ", "_")
        .replace("(%)", "pct")
        .replace("(Euro)", "euro")
        .lower()
)

┌───────┬─────┬────────┐
│ a_pct ┆ b   ┆ c_euro │
│ ---   ┆ --- ┆ ---    │
│ i64   ┆ i64 ┆ str    │
╞═══════╪═════╪════════╡
│ 1     ┆ 4   ┆ abc    │
│ 2     ┆ 5   ┆ def    │
│ 3     ┆ 6   ┆ ghi    │
└───────┴─────┴────────┘

TechQA.

Replace chars in existing column names without creating new columns

There are 1 answers

Related Questions in PYTHON

Related Questions in DATAFRAME

Related Questions in REPLACE

Related Questions in PYTHON-POLARS

Related Questions in CHAINING

Popular Questions

Trending Questions