change binary data like "111 into 001" in python by using if else or using regex

58 views Asked by At

I am having file in which some binary data available for example:

0100010000101111101101001011010000101110111101111110111110111111111000000000001

I want to convert 11 into 01

For example, if it is 01 that will be 01, if 011 it will be 001, if 0111 will be 0001, 01111 will be 00001

So after converting my above data will be:

0100010000100000100101001001010000100010000100000010000010000000001000000000001

I am using find and replace method multiple times.

fin = open("1.txt", "rt")
data = fin.read()

data = data.replace('011', '001')


fin.close()
fin = open("2.txt", "wt")
fin.write(data)
fin.close()

os.remove('1.txt')



fin = open("2.txt", "rt")
data = fin.read()

data = data.replace('011', '001')


fin.close()
fin = open("1.txt", "wt")
fin.write(data)
fin.close()

os.remove('2.txt')

Can some guide me how I can do it in one code using regex or if else statement? I just started learning python.

2

There are 2 answers

0
Nick On BEST ANSWER

You can simply replace a 1 which is followed by another 1 with a 0:

import re

data = "0100010000101111101101001011010000101110111101111110111110111111111000000000001"
data = re.sub(r'1(?=1)', '0', data)
print(data)

Output:

'0100010000100000100101001001010000100010000100000010000010000000001000000000001'
0
Casimir et Hippolyte On

A way to do it using bitwise operators:

dataFormat = f'0{len(data)}b'
value = int(data, 2)

result = f'{value ^ (value << 1) & value:{dataFormat}}'

Details:

First we apply a XOR operator with the left shifted value. The goal is to remove all 1's of a sequence except the last.

       0100010000101111101101001011010000101110111101111110111110111111111000000000001  
 XOR  01000100001011111011010010110100001011101111011111101111101111111110000000000010
--------------------------------------------------------------------------------------- 
  =   01100110001110000110111011101110001110011000110000011000011000000001000000000011 

It works well but now each 0 on the left of a 1s sequence (in the original value) becomes a 1.

To solve that you only need to apply the AND operator with the original value:

      01100110001110000110111011101110001110011000110000011000011000000001000000000011
 AND   0100010000101111101101001011010000101110111101111110111110111111111000000000001
---------------------------------------------------------------------------------------
  =    0100010000100000100101001001010000100010000100000010000010000000001000000000001

Other way, three successive replacements:

  • the first protects the '10's
  • the second replaces all 1s by 0s
  • the third restores the '10's

f'{data}0'.replace('10', 'X').replace('1', '0').replace('X', '10')[:-1]