Tensorflow 中的集体操作是什么？

What are collective ops in Tensorflow?

tensorflow
tensorflow-estimator

CollectiveAllReduce documentation 提及 'collective ops':

It is similar to the MirroredStrategy but it uses collective ops for reduction.

问题很简单，这些是什么？

尽管这是一个有点老的问题，但我想我不妨回答一下。

在镜像策略方面，Tensorflow (2.0) 有两种类型，MirroredStrategy 和 MultiWorkerMirroredStrategy。 MirrorStrategy 镜像每个副本上的变量——其中为机器上的每个 GPU 创建一个副本。另一方面，MultiWorkerMirroredStrategy 复制集群中所有 worker 的变量。这就是 multi-worker 需要 TF_CONFIG 环境变量设置的原因。

根据文档，CollectiveOps 有助于使变量在设备之间保持同步。这些操作在不同的工作人员之间共同执行收集、广播、减少和其他功能。

Tensorflow 中的集体操作是什么？

What are collective ops in Tensorflow?

tensorflow

tensorflow-estimator