Python: replace case insensitive flag doesn't work

219 views Asked by At

In my dataframe I want to replace different ways of representing something with a single consistent string. Examples:

  1. Replace [COM, COMMERCIAL] with "Commercial".
  2. Replace [FALSE, False, false, NO, No, N] with N and so on.

The list of values to be replaced and the replacement value come from another dataframe and will change as I run through each column in my main dataframe.

The ideal solution that should work is:

dfPA[col] = dfPA[col].replace(f'(?i){valold}', key)

where
valold = ['COM', 'COMMERCIAL']
key = 'Commercial'

This doesn't work. Maybe because valold is a list. So I tried:

for val in valold:
    dfPA[col] = dfPA[col].replace(f'(?i){val}', key)

It still doesn't work. Any thoughts?

Note: I CANNOT use dfPA[col] = dfPA[col].str.replace(valold, key, case=False, regex=False) because as explained here it will replace substrings too. And I then instead of 'Commercial' I see 'ComCom...Commercial'

1

There are 1 answers

2
Henry Ecker On BEST ANSWER

Either str.replace or replace can be used. Just make sure the pattern matches the start (^) and end ($) of the string for whole cell matches.

str.replace:

for val in valold:
    dfPA[col] = dfPA[col].str.replace(rf'^{val}$', key, case=False, regex=True)

replace:

for val in valold:
    dfPA[col] = dfPA[col].replace(rf'(?i)^{val}$', key, regex=True)

*regex=False by default for replace so the regex case insensitivity modifier will not work for replace without setting regex=True as it will literally match the characters "(?i)".


Sample Data and Output:

import pandas as pd

dfPA = pd.DataFrame({
    'col': ['COM', 'COMMERCIAL', 'COmMErCIaL', 'Something else',
            'comical']
})

valold = ['COM', 'COMMERCIAL']
key = 'Commercial'
col = 'col'
for val in valold:
    dfPA[col] = dfPA[col].str.replace(rf'^{val}$', key, case=False, regex=True)

print(dfPA)
              col
0      Commercial
1      Commercial
2      Commercial
3  Something else
4         comical