I have a csv-file containing a lot of data that I want to read as a masked array. I've done so using the following:
data=np.recfromcsv(filename,case_sensitive=True,usemask=True)
which works just fine. However, my problem is that the data are either strings, integers, or floats. What I want to do now is convert all the integers into floats, i.e. turn all the "1"s into "1.0"s etc. while preserving everything else.
Additionally, I am looking for a generic solution. So simply specifying the desired types manually won't do since the csv-file (including the number of columns) changes.
I've tried astype but since the array also has string-entries that won't work, or am I missing something?
Thanks.
I haven't used
recfromcsv, but looking at its code I see it usesnp.genfromtxt, followed by a masked records construction.I'd suggest giving a small sample
csvtext (3 or so lines), and show the resultingdata. We need to see thedtypein particular.It may also be useful to start with
genfromtxt, skipping the masked array stuff for now. I don't think that's where the sticky point is in converting dtypes in structured arrays.In any case, we need something more concrete to explore.
You can't change the
dtypeof structured fields in-place. You have to make a new array with a new dtype, and copy values from the old to the new.has some functions that can help in changing structured arrays.
===========
I suspect that it will be simpler to spell out the
dtypeswhen callinggenfromtxtthan to change dtypes in an existing array.You could try one read with the
dtype=Noneand limited number of lines to get the column count and basedtype. Then edit that, substituting floats for ints as needed. Now read the whole with the new dtype. Look in therecfunctionscode if you need ideas on how to edit dtypes.For example:
A crude dtype editor:
And applying this to default dtype:
=====================
astypeworks if the target dtype matches. For example if I read thetxtwith dtype=None, and then use the deriveddt, it works:Same for
arr.astype('U3,int,float,int')which also has 4 compatible fields.