Convert undelimited bytes to pandas DataFrame

43 views Asked by At

I am sorry if this is a duplicate, but I didn't find a suitable answer for this problem.

If have a bytes object in python, like this:

b'\n\x00\x00\x00\x01\x00\x00\x00TEST\xa2~\x08A\x83\x11\xe3@\x05\x00\x00\x00\x03\x00\x00\x00TEST\x91\x9b\xd1?\x1c\xaa,@'

It contains first a certain number of integer (4bytes) then a string with 4 characters and then a certain number of floats (4bytes). This is repeated a certain number of times which each correspond to a new row of data. The format of each row is the same and known. In the example this 2 rows of 2 integers, 1 string and 2 floats.

My question is, if there is a way to convert this kind of data to a pandas DataFrame directly.

My current approach was to first read all values (e.g. with struct.Struct.unpack) and place them in a list of lists. This however seem rather slow, especially for a large number of rows.

1

There are 1 answers

1
Mistraleuh On BEST ANSWER

This work fine for me:

import numpy as np
import pandas as pd

data = b'\n\x00\x00\x00\x01\x00\x00\x00TEST\xa2~\x08A\x83\x11\xe3@\x05\x00\x00\x00\x03\x00\x00\x00TEST\x91\x9b\xd1?\x1c\xaa,@'

dtype = np.dtype([
    ('int1', np.int32),
    ('int2', np.int32),
    ('string', 'S4'),
    ('float1', np.float32),
    ('float2', np.float32),
])

structured_array = np.frombuffer(data, dtype=dtype)

df = pd.DataFrame(structured_array)

df['string'] = df['string'].str.decode('utf-8')

print(df)

And give me this following output:

   int1  int2 string    float1    float2
0    10     1   TEST  8.530916  7.095888
1     5     3   TEST  1.637560  2.697883