在多 GPU 机器的情况下 'rescale_grad' 的正确值是多少?
What is the correct value of 'rescale_grad' in case of multi-GPU machine?
我的批量大小是 512,我有 8 个 GPU
我应该定义:
rescale_grad = 1. / 512 或 rescale_grad = 1. / (8*512)
谢谢!
批量大小与计算机相关,与 GPU 无关。引用(来自here):
Workload Partitioning
By default, MXNet partitions a data batch evenly among the available
GPUs. Assume a batch size b and assume there are k GPUs, then in one
iteration each GPU will perform forward and backward on b/k examples.
The gradients are then summed over all GPUs before updating the model.
在你的情况下 b
是 512。因此你应该使用 rescale_grad = 1. / 512
我的批量大小是 512,我有 8 个 GPU
我应该定义: rescale_grad = 1. / 512 或 rescale_grad = 1. / (8*512)
谢谢!
批量大小与计算机相关,与 GPU 无关。引用(来自here):
Workload Partitioning
By default, MXNet partitions a data batch evenly among the available GPUs. Assume a batch size b and assume there are k GPUs, then in one iteration each GPU will perform forward and backward on b/k examples. The gradients are then summed over all GPUs before updating the model.
在你的情况下 b
是 512。因此你应该使用 rescale_grad = 1. / 512