Tensorflow Keras, using generator with a .h5 file

35 views Asked by At

I have a .h5 file with about 2mil 256x256 images. The data doesn't fit into memory, which is why I am using a generator. I am wondering if I am iterating over the .h5 file in the correct way (currently using the h5py package).

(The reason why I am wondering is because the model is training very slowly (680ms/step). But this could also be because of other reasons.)

enter image description here

The code for the generator:

class Generator:
    def __init__(self, file_path):
        #self.data = h5py.File(file_path, 'r')
        self.file_path = file_path

    def __call__(self):
        with h5py.File(self.file_path, 'r') as data:
            for key in data.keys():
                obj = data[key]
                #X = np.array(obj['X'][()])
                Y = np.array(obj['Y'][()]) * normalization_factor
                Y = Y.reshape(256,256,1)
                yield (Y, Y)

The code for creating the tf.data.Dataset.from_generator()

def dataset(path_to_data, batch_size):
    return tf.data.Dataset.from_generator(
        Generator(path_to_data),
        output_signature = (
            tf.TensorSpec(shape=(256, 256, 1), dtype=tf.float32),
            tf.TensorSpec(shape=(256, 256, 1), dtype=tf.float32),
        )
    ).batch(batch_size)
0

There are 0 answers