分布式计算的 Tensorflow 设置
Tensorflow Setup for Distributed Computing
任何人都可以提供有关如何设置 tensorflow 以在网络中的多个 CPU 上工作的指导吗?到目前为止,我发现的所有示例最多只使用一个本地盒子和多个 GPU。我发现我可以在 session_opts 中传递目标列表,但我不确定如何在每个盒子上设置 tensorflow 以侦听网络 nodes/tasks。任何例子将不胜感激!
TensorFlow开源版本(目前为0.6.0)仅支持单进程执行:特别是tensorflow::SessionOptions
中唯一有效的目标是空字符串,即"current process."
TensorFlow whitepaper describes the structure of the distributed implementation (see Figure 3) that we use inside Google. The basic idea is that the Session interface can be implemented using RPC to a master; and the master can partition the computation across a set of devices in multiple worker processes, which also communicate using RPC. Alas, the current version depends heavily on Google-internal technologies (like Borg), so a lot of work remains to make it ready for external consumption. We are currently working on this, and you can follow the progress on this GitHub issue.
2016 年 2 月 26 日编辑: 今天我们发布了一个 initial version of the distributed runtime 到 GitHub。支持多机多GPU。
任何人都可以提供有关如何设置 tensorflow 以在网络中的多个 CPU 上工作的指导吗?到目前为止,我发现的所有示例最多只使用一个本地盒子和多个 GPU。我发现我可以在 session_opts 中传递目标列表,但我不确定如何在每个盒子上设置 tensorflow 以侦听网络 nodes/tasks。任何例子将不胜感激!
TensorFlow开源版本(目前为0.6.0)仅支持单进程执行:特别是tensorflow::SessionOptions
中唯一有效的目标是空字符串,即"current process."
TensorFlow whitepaper describes the structure of the distributed implementation (see Figure 3) that we use inside Google. The basic idea is that the Session interface can be implemented using RPC to a master; and the master can partition the computation across a set of devices in multiple worker processes, which also communicate using RPC. Alas, the current version depends heavily on Google-internal technologies (like Borg), so a lot of work remains to make it ready for external consumption. We are currently working on this, and you can follow the progress on this GitHub issue.
2016 年 2 月 26 日编辑: 今天我们发布了一个 initial version of the distributed runtime 到 GitHub。支持多机多GPU。