How would I change/remove 'non-printable' characters e.g  from df.columns values incorporating the regex statements already in place

138 views Asked by At

Have tried the above with no success. Note ..This is specific to the text Column Headings and not the Column Values

df.columns = [x.lower().replace(" ","").replace("?","").replace("_","").replace( "Â" , "") for x in df.columns]

Would have replaced the non-printable character but has failed.

Can anyone help ?

csv export post the suggested solution

1

There are 1 answers

2
Pawel Kam On

First of all, please remember that replace is case sensitive. Also, when chaining functions, the order is important.

"Â".lower().replace("Â", "") # "â"
"Â".replace("Â", "").lower() # ""

If the reason for the matter in question is a Mojibake encoding/decoding issue, you can try this quick fix with ftfy library. You can use it in conjunction with the rename function.

import ftfy

def _change_column_name(val):
    # fix mojibake
    val = ftfy.fix_text(val)
    # whatever data processing you need
    return val.replace("Â", "").lower()

df.rename(columns=_change_column_name, inplace=True)

@tripleee is right, though. Maybe instead of quick fix you'd want to fix encoding/decoding errors in your source data.