I'm writing a code for a CNN in Tensorflow, that I'm trying to parallelize on multiple nodes and multiple GPUs using Horovod.
To generate the image batches, I have :
train_generator = ImageDataGenerator(rescale = 1./255,
preprocessing_function=preprocess_input)
train_iterator = train_generator.flow_from_directory(train_loc,
seed=SEED,
target_size=(224, 224))
In Python, because of the GIL, to parallelize code, each node/thread must execute copies of the exact same code, right ? So, doesn't that mean that each node will produce the same dataset iterator? And in that case, won't each GPU be receiving the exact same batch of images?
I'm sure I'm wrong in saying that all the GPUs are training on the same batches, but I just dont see how Horovod handles distributing the batches.
Just like there's hvd.DistributedOptimizer to parallelize backprop, what is in charge of distributing images when we call :
model.fit(train_iterator,
epochs=epochs,
callbacks=callbacks )
?
Thanks.
Liam