Why is h5py cutting strings strings short when using null-termination?

17 views Asked by At

I want to create a .h5 file that contains some data and is fed into a software. I managed to recreate a file that has the exact same structure and data types that the software should be able to read - except for the datasets containing strings. I believe the only difference is the way the h5py module creates/treats strings, because it seems like one digit is lost when storing the data.

More specifically, I am trying to store the strings with a data type of

String, length = 3, padding = H5T_STR_NULLTERM, cset = H5T_CSET_ASCII

so the software can read the data (the length is arbitrary). Typically h5py creates string-containg datasets with a padding of H5T_STR_NULLPAD. When I try to create a null-terminated string data type and store data in a dataset of that type, for example by using:

import h5py
from h5py import Datatype

tid = h5py.h5t.FORTRAN_S1.copy()
tid.set_size(3)
tid.set_strpad(h5py.h5t.STR_NULLTERM)

dtype = h5py.Datatype(tid)

with h5py.File('example.h5', 'w') as f:
    dataset = f.create_dataset('string_dataset', shape=(), dtype=dtype, data=['1346'])

the output .h5 file will contain a dataset with the length 3 according to the data type, but the dataset itself will only contain 2 characters (in this example '13').

When instead creating a data type with

tid = h5py.h5t.FORTRAN_S1.copy()
tid.set_size(3)
tid.set_strpad(h5py.h5t.STR_NULLPAD)

the string will be correctly stored with three characters, e.g., '134'.

Is there a way to correctly store the string with NULLTERM-padding so that all characters are stored in the dataset, while keeping the length of the string the same (based on the data type)?

Thanks in advance!

0

There are 0 answers