Keras Tensorflow 的 BatchNormalization 层中的属性 'trainable' 和 'training' 有什么区别?
What's the difference between attrubutes 'trainable' and 'training' in BatchNormalization layer in Keras Tensorfolow?
根据tensorflow的官方文档:
About setting layer.trainable = False on a `BatchNormalization layer:
The meaning of setting layer.trainable = False is to freeze the layer, i.e. its internal state will not change during training: its trainable weights will not be updated during fit() or train_on_batch(), and its state updates will not be run.
Usually, this does not necessarily mean that the layer is run in inference mode (which is normally controlled by the training argument that can be passed when calling a layer). "Frozen state" and "inference mode" are two separate concepts.
However, in the case of the BatchNormalization layer, setting trainable = False on the layer means that the layer will be subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch).
This behavior has been introduced in TensorFlow 2.0, in order to enable layer.trainable = False to produce the most commonly expected behavior in the convnet fine-tuning use case.
这里的'frozen state'和'inference mode'这个概念不太明白。我尝试通过将 trainable
设置为 False 进行微调,但我发现移动均值和移动方差没有更新。
所以我有以下问题:
- 2属性training和trainable有什么区别?
- 如果将 trainable 设置为 false,gamma 和 beta 是否在训练过程中得到更新?
- 微调的时候为什么要把trainable设置为false?
What's the difference between 2 attributes training and trainable?
可训练:-(如果为真)这基本上意味着参数(层)的“可训练”权重将在反向传播中更新。
training:- 一些层在训练和推理(或测试)步骤中表现不同。一些示例包括 Dropout Layer、Batch-Normalization 层。所以这个属性告诉图层它应该以什么方式执行。
Is gamma and beta getting updated in the training process if set trainable to false?
由于 gamma 和 beta 是 BN 层的“可训练”参数,如果 set trainable 设置为“False”,它们将不会在训练过程中更新。
Why is it necessary to set trainable to false when fine-tuning?
在进行微调时,我们首先在顶部添加我们自己的分类 FC 层,该层是随机初始化的,但我们的“预训练”模型已经针对任务进行了校准(有点)。
打个比方,这样想。
您有一条从 0 到 10 的数字线。在此数字线上,“0”代表完全随机化的模型,而“10”代表一种完美模型。我们的预训练模型是
大约 5 或 6 或 7 左右,即很可能比随机模型更好。我们在顶部添加的 FC 层位于“0”,因为它在开始时是随机的。
我们为预训练模型设置trainable = False,这样我们可以让FC层快速达到预训练模型的水平,即具有更高的学习率。如果我们不为预训练模型设置 trainable = False 并使用更高的学习率,那么它将造成严重破坏。
所以一开始,我们给预训练模型设置一个更高的学习率和trainable = False,然后训练FC层。之后,我们解冻我们的预训练模型并使用非常低的学习率来达到我们的目的。
如果需要,请自由询问更多说明,如果您觉得有帮助,请点赞。
根据tensorflow的官方文档:
About setting layer.trainable = False on a `BatchNormalization layer:
The meaning of setting layer.trainable = False is to freeze the layer, i.e. its internal state will not change during training: its trainable weights will not be updated during fit() or train_on_batch(), and its state updates will not be run.
Usually, this does not necessarily mean that the layer is run in inference mode (which is normally controlled by the training argument that can be passed when calling a layer). "Frozen state" and "inference mode" are two separate concepts.
However, in the case of the BatchNormalization layer, setting trainable = False on the layer means that the layer will be subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch).
This behavior has been introduced in TensorFlow 2.0, in order to enable layer.trainable = False to produce the most commonly expected behavior in the convnet fine-tuning use case.
这里的'frozen state'和'inference mode'这个概念不太明白。我尝试通过将 trainable
设置为 False 进行微调,但我发现移动均值和移动方差没有更新。
所以我有以下问题:
- 2属性training和trainable有什么区别?
- 如果将 trainable 设置为 false,gamma 和 beta 是否在训练过程中得到更新?
- 微调的时候为什么要把trainable设置为false?
What's the difference between 2 attributes training and trainable?
可训练:-(如果为真)这基本上意味着参数(层)的“可训练”权重将在反向传播中更新。
training:- 一些层在训练和推理(或测试)步骤中表现不同。一些示例包括 Dropout Layer、Batch-Normalization 层。所以这个属性告诉图层它应该以什么方式执行。
Is gamma and beta getting updated in the training process if set trainable to false?
由于 gamma 和 beta 是 BN 层的“可训练”参数,如果 set trainable 设置为“False”,它们将不会在训练过程中更新。
Why is it necessary to set trainable to false when fine-tuning?
在进行微调时,我们首先在顶部添加我们自己的分类 FC 层,该层是随机初始化的,但我们的“预训练”模型已经针对任务进行了校准(有点)。
打个比方,这样想。
您有一条从 0 到 10 的数字线。在此数字线上,“0”代表完全随机化的模型,而“10”代表一种完美模型。我们的预训练模型是 大约 5 或 6 或 7 左右,即很可能比随机模型更好。我们在顶部添加的 FC 层位于“0”,因为它在开始时是随机的。
我们为预训练模型设置trainable = False,这样我们可以让FC层快速达到预训练模型的水平,即具有更高的学习率。如果我们不为预训练模型设置 trainable = False 并使用更高的学习率,那么它将造成严重破坏。
所以一开始,我们给预训练模型设置一个更高的学习率和trainable = False,然后训练FC层。之后,我们解冻我们的预训练模型并使用非常低的学习率来达到我们的目的。
如果需要,请自由询问更多说明,如果您觉得有帮助,请点赞。