replace ^M(control M character) in a text file in python

Question

replace ^M(control M character) in a text file in python

439 views Asked by Arthur At 22 January 2023 at 18:08

The file is like this:

This line has control character ^M this is bad
I will try it

I want to remove control M characters in the file, and create a new file like this using Python

This line has control character  this is bad
I will try it

I tried couple of approaches below:

line.replace("\r", "")

and

line.replace("\r\n", "")

Here is part of the code snippet:

with open(file_path, "r") as input_file:
    lines = input_file.readlines()

new_lines = []
for line in lines:
    new_line = line.replace("\r", "")
    new_lines.append(new_line)

new_file_name = "replace_control_char.dat"
new_file_path = os.path.join(here, data_dir, new_file_name)
with open(new_file_path, "w") as output_file:
    for line in new_lines:
        output_file.write(line)

However, the new file I got is:

This line has control character
 this is bad
I will try it

"This line has control character" and " this is bad" are not on the same line. I expect remove control M character will make these two phrases on the same line. Can someone help me solve this issue?

Thanks, Arthur

Original Q&A

There are 1 answers

**Jean-François Fabre** · Accepted Answer · 2023-01-22T18:22:32+00:00

You cannot rely on text mode in that case.

On Windows understands sole \r as linefeeds (even if the "official" line terminator is \r\n) and on Macintosh, the line terminator can be only \r. Text mode converts linefeeds as \n or remove them if followed by \n, so it destroys the information you need.

Universal newlines by default makes this code also fail on Unix/Linux. Python behaves the same on all platforms

Python doesn’t depend on the underlying operating system’s notion of text files; all the the processing is done by Python itself, and is therefore platform-independent.

If you want to remove those, you have to use binary mode.

with open(file_path, "rb") as input_file:
    contents = input_file.read().replace(b"\r",b"")
with open(file_path, "wb") as output_file:
    output_file.write(contents)

That code will remove all \r characters (including line terminators). That works but if your aim is just to remove stray \r and preserve endlines, another method is required.

One way to do it is to use a regular expression, which can accept binary (bytes) as well:

re.sub(rb"\r([^\n])",rb"\1",contents)

That regular expression removes \r chars only if not followed by \n chars, efficiently preserving CR+LF windows end-of-line sequences.

TechQA.

replace ^M(control M character) in a text file in python

There are 1 answers

Related Questions in PYTHON

Related Questions in STRING

Related Questions in CONTROL-CHARACTERS

Popular Questions

Trending Questions