According to a recently published white paper and the RFC on GitHub, tensorflow eager currently supports distributed execution. It is mentioned that, similar to the graph mode, we can run an operation eagerly on a remote device by setting the device name as, for example, "/job:training/task:2/device:GPU:0". However, I can't find any code examples or tutorials on how to do it.
I do note that there are plenty of tutorials on tf.distribute, a high-level API for distributed training, which supports both graph and eager mode. However, I am more interested in how tf.distribute works under the hood for the eager mode. Specifically, I would like to know:
How to connect a client to a remote server under eager mode?
When and how is the cluster definition specified under eager mode?
I would be grateful if someone could provide answers to those questions. Thanks!