Tensorflow 跨设备通信

Tensorflow Cross Device Communication

正如tensorflow论文所述,Tensorflow的跨设备通信是通过在设备中添加"receive node"和"send node"来实现的。

据我了解,设备(请考虑仅涉及 CPU 个设备)负责执行操作的计算。但是,数据(例如:操作产生的张量,变量缓冲区)驻留在内存中。我不知道从一台设备到另一台设备的数据传输是如何实现的物理上。我猜数据传输是通过共享内存实现的。那正确吗?

我将感谢有关如何实现数据传输的任何 explanation/corresponding 代码。 PS: TensorFlow paper link,跨设备通信机制如图4所示

在 TensorFlow 中,跨设备通信是使用 Rendezvous 接口实现的,该接口有多种不同的实现,具体取决于部署。该界面上的评论描述了总体思路:

// A Rendezvous is an abstraction for passing a Tensor
// from a producer to a consumer, where the consumer may safely
// request the Tensor before or after it has been produced.  A
// producer never blocks when using a Rendezvous.  A consumer has the
// choice of making a blocking call or providing a callback: in either
// case, the consumer receives the Tensor as soon as it is available.

正如您在问题中指出的那样,TensorFlow 使用 SendRecv 操作表示数据流图中的通信,当图形跨设备分区时,这些操作会自动添加到图形中。对于在不同设备上具有源和目标的每条边,图分区器插入一对 SendRecv ops,它们共享相同的 "rendezvous key"(一个自动生成的字符串名称,用于作为要传达的未决张量的集合点索引中的键)。 implementation of the Send op is simple: it calls Rendezvous::Send(), passing in its rendezvous key and single input tensor, then returns immediately without blocking. The implementation of the Recv op is slightly more complicated: it registers a callback to be called when the tensor with the given key becomes available. That callback 负责 "producing" Recv 操作的输出,并解除后续计算的阻塞。

Rendezvous 实现执行传输数据的实际工作:

  • IntraProcessRendezvous handles the transfer of data between devices in the same process. In the (unlikely) event that the transfer is between two CPU devices in the same process, the transfer can be achieved by a simple Tensor assignment. Otherwise, TensorFlow kicks off 特定于设备的 DMA 例程,用于在 CPU 和 GPU 设备之间传输数据。

  • BaseRemoteRendezvous class and its subclasses handle cross-device communication in the case that the send and receiver can be in different processes. The main implementation of this class is RpcRemoteRendezvous, which uses gRPC 处理远程传输。