为什么dropout前后的差值不等于dropout比例?
Why the difference before and after dropout is not equal to the dropout proportion?
我正在指定一个具有 dropout 正则化的网络。但是我无法理解这里是如何处理辍学的。具体来说,为什么应用dropout前后的零个数之差不正好等于dropout比例?
class DropoutDenseNetwork(tf.Module):
def __init__(self, name=None):
super(DropoutDenseNetwork, self).__init__(name=name)
self.dense_layer1 = Dense(32)
self.dropout = tf.keras.layers.Dropout(0.2)
self.dense_layer2 = Dense(10, activation=tf.identity)
@tf.function
def __call__(self, x, is_training):
embed = self.dense_layer1(x)
propn_zero_before = tf.reduce_mean(tf.cast(tf.equal(embed, 0.), tf.float32))
embed = self.dropout(embed, is_training)
propn_zero_after = tf.reduce_mean(tf.cast(tf.equal(embed, 0.), tf.float32))
tf.print('Zeros before and after:', propn_zero_before, "and", propn_zero_after)
output = self.dense_layer2(embed)
return output
if 'drop_dense_net' not in locals():
drop_dense_net = DropoutDenseNetwork()
drop_dense_net(tf.ones([1, 100]), tf.constant(True))
因为rate
只是在训练过程中任意神经元被丢弃的概率。它不会总是正好落在 0.2 的数据上,尤其是只有 32 个单位时。如果增加单位数(例如 100,000),您会发现它更接近 rate
of 0.2:
import tensorflow as tf
from tensorflow.keras.layers import *
class DropoutDenseNetwork(tf.Module):
def __init__(self, name=None):
super(DropoutDenseNetwork, self).__init__(name=name)
self.dense_layer1 = Dense(100000)
self.dropout = tf.keras.layers.Dropout(0.2)
self.dense_layer2 = Dense(1)
@tf.function
def __call__(self, x, is_training):
embed = self.dense_layer1(x)
propn_zero_before = tf.reduce_mean(tf.cast(tf.equal(embed, 0.), tf.float32))
embed = self.dropout(embed, is_training)
propn_zero_after = tf.reduce_mean(tf.cast(tf.equal(embed, 0.), tf.float32))
tf.print('Zeros before and after:', propn_zero_before, "and", propn_zero_after)
drop_dense_net = DropoutDenseNetwork()
drop_dense_net(tf.ones([1, 10]), tf.constant(True))
Zeros before and after: 0 and 0.19954
在幕后,tf.keras.layers.Dropout
使用 tf.nn.dropout
。后面解释的文档:
rate: The probability that each element is dropped
在 source code 中,您可以看到它创建了一个与输入形状相同的随机值掩码,并选择高于速率的值。当然,并非恰好有 0.2 个值会高于 rate
:
random_tensor = random_ops.random_uniform(
noise_shape, seed=seed, dtype=x_dtype)
keep_mask = random_tensor >= rate
我正在指定一个具有 dropout 正则化的网络。但是我无法理解这里是如何处理辍学的。具体来说,为什么应用dropout前后的零个数之差不正好等于dropout比例?
class DropoutDenseNetwork(tf.Module):
def __init__(self, name=None):
super(DropoutDenseNetwork, self).__init__(name=name)
self.dense_layer1 = Dense(32)
self.dropout = tf.keras.layers.Dropout(0.2)
self.dense_layer2 = Dense(10, activation=tf.identity)
@tf.function
def __call__(self, x, is_training):
embed = self.dense_layer1(x)
propn_zero_before = tf.reduce_mean(tf.cast(tf.equal(embed, 0.), tf.float32))
embed = self.dropout(embed, is_training)
propn_zero_after = tf.reduce_mean(tf.cast(tf.equal(embed, 0.), tf.float32))
tf.print('Zeros before and after:', propn_zero_before, "and", propn_zero_after)
output = self.dense_layer2(embed)
return output
if 'drop_dense_net' not in locals():
drop_dense_net = DropoutDenseNetwork()
drop_dense_net(tf.ones([1, 100]), tf.constant(True))
因为rate
只是在训练过程中任意神经元被丢弃的概率。它不会总是正好落在 0.2 的数据上,尤其是只有 32 个单位时。如果增加单位数(例如 100,000),您会发现它更接近 rate
of 0.2:
import tensorflow as tf
from tensorflow.keras.layers import *
class DropoutDenseNetwork(tf.Module):
def __init__(self, name=None):
super(DropoutDenseNetwork, self).__init__(name=name)
self.dense_layer1 = Dense(100000)
self.dropout = tf.keras.layers.Dropout(0.2)
self.dense_layer2 = Dense(1)
@tf.function
def __call__(self, x, is_training):
embed = self.dense_layer1(x)
propn_zero_before = tf.reduce_mean(tf.cast(tf.equal(embed, 0.), tf.float32))
embed = self.dropout(embed, is_training)
propn_zero_after = tf.reduce_mean(tf.cast(tf.equal(embed, 0.), tf.float32))
tf.print('Zeros before and after:', propn_zero_before, "and", propn_zero_after)
drop_dense_net = DropoutDenseNetwork()
drop_dense_net(tf.ones([1, 10]), tf.constant(True))
Zeros before and after: 0 and 0.19954
在幕后,tf.keras.layers.Dropout
使用 tf.nn.dropout
。后面解释的文档:
rate: The probability that each element is dropped
在 source code 中,您可以看到它创建了一个与输入形状相同的随机值掩码,并选择高于速率的值。当然,并非恰好有 0.2 个值会高于 rate
:
random_tensor = random_ops.random_uniform(
noise_shape, seed=seed, dtype=x_dtype)
keep_mask = random_tensor >= rate