How can I import multiple columns of a file to the same array in Python?

65 views Asked by At

I have a .txt file of the following form:

header line 1
header line 2
x1  y1  x4  y4  x7  y7
x2  y2  x5  y5  x8  y8
x3  y3  x6  y6  x9  y9
footer line

the x and y values would be separated by a tab and in my case be numbers of the form "2,9 " (including the last space). Example:

header line 1
header line 2
1,0     1,5     4,0     4,5     7,0     7,5 
2,0     2,5     5,0     5,5     8,0     8,5 
3,0     3,5     6,0     6,5     9,0     9,5 
footer line

The file is encoded in latin-1. I'm looking for an easy way to get numpy arrays of my x and y converted to float, that is:

array([1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0])

and similarly for y.

I've first created a function to replace "," by "." and remove trailing spaces:

import numpy as np

def ctf(valstr):
    return float(valstr.replace(',','.').replace(" ",""))

then defined a dictionary of variable length to use later:

def dic(length):
    dic={}
    for i in range(0,length):
        dic[i]=ctf
    return dic

I could then "manually" read in the columns and join them together:

xval1,yval1,xval2,yval2,xval3,yval4=np.genfromtxt("file.txt",delimiter="",unpack=True,skip_header=2,skip_footer=1,encoding="latin-1",converters=dic(6))

xvalues=np.concatenate((xval1,xval2,xval3))
yvalues=np.concatenate((yval1,yval2,yval3))

It works but isn't exactly pretty, especially if I have even more columns. What I would like is a method to only having to specify the total number of columns (in the case above 6) and the number of arrays I want to get (in my example 2).

Note: I don't think the converter/dictionary part is actually relevant for my problem. I included it because I need any alternative solution to be able to use converters or achieve the same result in some other way.

2

There are 2 answers

0
e-motta On BEST ANSWER

You can use Pandas to import only the relevant lines from the file, then flatten the columns into numpy arrays:

import pandas as pd

df = pd.read_csv("file.txt", header=None, skiprows=2, skipfooter=1, sep=r"\s+")

df = df.replace(",", ".", regex=True).astype(float)

n = 2

arrays = [df.iloc[:, i::n].to_numpy().flatten(order="F") for i in range(n)]
[array([1., 2., 3., 4., 5., 6., 7., 8., 9.]), array([1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5])]
0
hpaulj On

Your text sample:

In [33]: txt='''header line 1
    ...: header line 2
    ...: 1,0     1,5     4,0     4,5     7,0     7,5 
    ...: 2,0     2,5     5,0     5,5     8,0     8,5 
    ...: 3,0     3,5     6,0     6,5     9,0     9,5 
    ...: footer line'''.splitlines()

For float conversion you don't need to worry about trailing blanks:

In [34]: def ctf(valstr):
    ...:     return float(valstr.replace(',','.'))
    ...:         
In [35]: def dic(length):
    ...:     dic={}
    ...:     for i in range(0,length):
    ...:         dic[i]=ctf
    ...:     return dic
    ...:     

Load into one 2d array:

In [36]: data=np.genfromtxt(txt,skip_header=2,skip_footer=1,encoding="latin-1",converters=dic(6))

In [37]: data
Out[37]: 
array([[1. , 1.5, 4. , 4.5, 7. , 7.5],
       [2. , 2.5, 5. , 5.5, 8. , 8.5],
       [3. , 3.5, 6. , 6.5, 9. , 9.5]])

Then split on every other column:

In [38]: data[:,::2]
Out[38]: 
array([[1., 4., 7.],
       [2., 5., 8.],
       [3., 6., 9.]])

In [39]: data[:,::2].ravel(order='F')
Out[39]: array([1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [40]: data[:,1::2].ravel(order='F')
Out[40]: array([1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5])

I'm using order='F' same as in the other answer. My data is similar to their tonumpy() array.

If I leave the unpack in, data is the transpose, from which we can select every other row:

In [41]: data=np.genfromtxt(txt,skip_header=2,skip_footer=1,encoding="latin-1",converters=dic(6),unpack=True)

In [42]: data
Out[42]: 
array([[1. , 2. , 3. ],
       [1.5, 2.5, 3.5],
       [4. , 5. , 6. ],
       [4.5, 5.5, 6.5],
       [7. , 8. , 9. ],
       [7.5, 8.5, 9.5]])

In [43]: data[::2,:]
Out[43]: 
array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])

In [44]: data[::2,:].ravel()
Out[44]: array([1., 2., 3., 4., 5., 6., 7., 8., 9.])

I could also get data as a list of arrays, and concatenate every other one:

In [49]: [*data]=np.genfromtxt(txt,skip_header=2,skip_footer=1,encoding="latin-1",converters=dic(6),unpack=True)

In [50]: data
Out[50]: 
[array([1., 2., 3.]),
 array([1.5, 2.5, 3.5]),
 array([4., 5., 6.]),
 array([4.5, 5.5, 6.5]),
 array([7., 8., 9.]),
 array([7.5, 8.5, 9.5])]

In [51]: data[::2]
Out[51]: [array([1., 2., 3.]), array([4., 5., 6.]), array([7., 8., 9.])]

In [52]: np.concatenate(data[::2])
Out[52]: array([1., 2., 3., 4., 5., 6., 7., 8., 9.])

This is just a streamlining of your manual unpacking.