TensorFlow - L2 损失的正则化,如何应用于所有权重,而不仅仅是最后一个?
TensorFlow - regularization with L2 loss, how to apply to all weights, not just last one?
我正在玩 ANN,它是优达学城深度学习课程的一部分。
我有一项作业涉及使用 L2 损失向具有一个隐藏 ReLU 层的网络引入泛化。我想知道如何正确地引入它,以便所有权重都受到惩罚,而不仅仅是输出层的权重。
网络没有泛化的代码在post的底部(代码实际上运行训练超出了问题的范围).
引入 L2 的明显方法是用这样的东西代替损失计算(如果 beta 为 0.01):
loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(out_layer, tf_train_labels) + 0.01*tf.nn.l2_loss(out_weights))
但在这种情况下,它会考虑输出层的权重值。我不确定,我们如何正确地惩罚进入隐藏 ReLU 层的权重。是否需要它,或者引入输出层的惩罚也会以某种方式控制隐藏的权重?
#some importing
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range
#loading data
pickle_file = '/home/maxkhk/Documents/Udacity/DeepLearningCourse/SourceCode/tensorflow/examples/udacity/notMNIST.pickle'
with open(pickle_file, 'rb') as f:
save = pickle.load(f)
train_dataset = save['train_dataset']
train_labels = save['train_labels']
valid_dataset = save['valid_dataset']
valid_labels = save['valid_labels']
test_dataset = save['test_dataset']
test_labels = save['test_labels']
del save # hint to help gc free up memory
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)
#prepare data to have right format for tensorflow
#i.e. data is flat matrix, labels are onehot
image_size = 28
num_labels = 10
def reformat(dataset, labels):
dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
# Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)
#now is the interesting part - we are building a network with
#one hidden ReLU layer and out usual output linear layer
#we are going to use SGD so here is our size of batch
batch_size = 128
#building tensorflow graph
graph = tf.Graph()
with graph.as_default():
# Input data. For the training data, we use a placeholder that will be fed
# at run time with a training minibatch.
tf_train_dataset = tf.placeholder(tf.float32,
shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
#now let's build our new hidden layer
#that's how many hidden neurons we want
num_hidden_neurons = 1024
#its weights
hidden_weights = tf.Variable(
tf.truncated_normal([image_size * image_size, num_hidden_neurons]))
hidden_biases = tf.Variable(tf.zeros([num_hidden_neurons]))
#now the layer itself. It multiplies data by weights, adds biases
#and takes ReLU over result
hidden_layer = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights) + hidden_biases)
#time to go for output linear layer
#out weights connect hidden neurons to output labels
#biases are added to output labels
out_weights = tf.Variable(
tf.truncated_normal([num_hidden_neurons, num_labels]))
out_biases = tf.Variable(tf.zeros([num_labels]))
#compute output
out_layer = tf.matmul(hidden_layer,out_weights) + out_biases
#our real output is a softmax of prior result
#and we also compute its cross-entropy to get our loss
loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(out_layer, tf_train_labels))
#now we just minimize this loss to actually train the network
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
#nice, now let's calculate the predictions on each dataset for evaluating the
#performance so far
# Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(out_layer)
valid_relu = tf.nn.relu( tf.matmul(tf_valid_dataset, hidden_weights) + hidden_biases)
valid_prediction = tf.nn.softmax( tf.matmul(valid_relu, out_weights) + out_biases)
test_relu = tf.nn.relu( tf.matmul( tf_test_dataset, hidden_weights) + hidden_biases)
test_prediction = tf.nn.softmax(tf.matmul(test_relu, out_weights) + out_biases)
hidden_weights
、hidden_biases
、out_weights
和out_biases
都是您正在创建的模型参数。您可以将 L2 正则化添加到所有这些参数,如下所示:
loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=out_layer, labels=tf_train_labels)) +
0.01*tf.nn.l2_loss(hidden_weights) +
0.01*tf.nn.l2_loss(hidden_biases) +
0.01*tf.nn.l2_loss(out_weights) +
0.01*tf.nn.l2_loss(out_biases))
根据@Keight Johnson 的说明,不规范偏差:
loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=out_layer, labels=tf_train_labels)) +
0.01*tf.nn.l2_loss(hidden_weights) +
0.01*tf.nn.l2_loss(out_weights) +
一种更短且可扩展的方法是;
vars = tf.trainable_variables()
lossL2 = tf.add_n([ tf.nn.l2_loss(v) for v in vars ]) * 0.001
这基本上是所有可训练变量的 l2_loss 的总和。您还可以创建一个字典,在其中仅指定要添加到成本中的变量并使用上面的第二行。然后,您可以将 lossL2 与您的 softmax 交叉熵值相加,以计算您的总损失。
Edit :正如 Piotr Dabkowski 所提到的,上面的代码也会对偏差进行正则化 。这可以通过在第二行添加一个 if 语句来避免;
lossL2 = tf.add_n([ tf.nn.l2_loss(v) for v in vars
if 'bias' not in v.name ]) * 0.001
这可以用来排除其他变量。
事实上,我们通常不会对偏置项(截距)进行正则化。
所以,我选择:
loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=out_layer, labels=tf_train_labels)) +
0.01*tf.nn.l2_loss(hidden_weights) +
0.01*tf.nn.l2_loss(out_weights))
通过惩罚截距项,当截距被添加到 y 值时,它会导致改变 y 值,向截距添加常数 c。有或没有不会改变结果,但需要一些计算
我正在玩 ANN,它是优达学城深度学习课程的一部分。
我有一项作业涉及使用 L2 损失向具有一个隐藏 ReLU 层的网络引入泛化。我想知道如何正确地引入它,以便所有权重都受到惩罚,而不仅仅是输出层的权重。
网络没有泛化的代码在post的底部(代码实际上运行训练超出了问题的范围).
引入 L2 的明显方法是用这样的东西代替损失计算(如果 beta 为 0.01):
loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(out_layer, tf_train_labels) + 0.01*tf.nn.l2_loss(out_weights))
但在这种情况下,它会考虑输出层的权重值。我不确定,我们如何正确地惩罚进入隐藏 ReLU 层的权重。是否需要它,或者引入输出层的惩罚也会以某种方式控制隐藏的权重?
#some importing
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range
#loading data
pickle_file = '/home/maxkhk/Documents/Udacity/DeepLearningCourse/SourceCode/tensorflow/examples/udacity/notMNIST.pickle'
with open(pickle_file, 'rb') as f:
save = pickle.load(f)
train_dataset = save['train_dataset']
train_labels = save['train_labels']
valid_dataset = save['valid_dataset']
valid_labels = save['valid_labels']
test_dataset = save['test_dataset']
test_labels = save['test_labels']
del save # hint to help gc free up memory
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)
#prepare data to have right format for tensorflow
#i.e. data is flat matrix, labels are onehot
image_size = 28
num_labels = 10
def reformat(dataset, labels):
dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
# Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)
#now is the interesting part - we are building a network with
#one hidden ReLU layer and out usual output linear layer
#we are going to use SGD so here is our size of batch
batch_size = 128
#building tensorflow graph
graph = tf.Graph()
with graph.as_default():
# Input data. For the training data, we use a placeholder that will be fed
# at run time with a training minibatch.
tf_train_dataset = tf.placeholder(tf.float32,
shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
#now let's build our new hidden layer
#that's how many hidden neurons we want
num_hidden_neurons = 1024
#its weights
hidden_weights = tf.Variable(
tf.truncated_normal([image_size * image_size, num_hidden_neurons]))
hidden_biases = tf.Variable(tf.zeros([num_hidden_neurons]))
#now the layer itself. It multiplies data by weights, adds biases
#and takes ReLU over result
hidden_layer = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights) + hidden_biases)
#time to go for output linear layer
#out weights connect hidden neurons to output labels
#biases are added to output labels
out_weights = tf.Variable(
tf.truncated_normal([num_hidden_neurons, num_labels]))
out_biases = tf.Variable(tf.zeros([num_labels]))
#compute output
out_layer = tf.matmul(hidden_layer,out_weights) + out_biases
#our real output is a softmax of prior result
#and we also compute its cross-entropy to get our loss
loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(out_layer, tf_train_labels))
#now we just minimize this loss to actually train the network
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
#nice, now let's calculate the predictions on each dataset for evaluating the
#performance so far
# Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(out_layer)
valid_relu = tf.nn.relu( tf.matmul(tf_valid_dataset, hidden_weights) + hidden_biases)
valid_prediction = tf.nn.softmax( tf.matmul(valid_relu, out_weights) + out_biases)
test_relu = tf.nn.relu( tf.matmul( tf_test_dataset, hidden_weights) + hidden_biases)
test_prediction = tf.nn.softmax(tf.matmul(test_relu, out_weights) + out_biases)
hidden_weights
、hidden_biases
、out_weights
和out_biases
都是您正在创建的模型参数。您可以将 L2 正则化添加到所有这些参数,如下所示:
loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=out_layer, labels=tf_train_labels)) +
0.01*tf.nn.l2_loss(hidden_weights) +
0.01*tf.nn.l2_loss(hidden_biases) +
0.01*tf.nn.l2_loss(out_weights) +
0.01*tf.nn.l2_loss(out_biases))
根据@Keight Johnson 的说明,不规范偏差:
loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=out_layer, labels=tf_train_labels)) +
0.01*tf.nn.l2_loss(hidden_weights) +
0.01*tf.nn.l2_loss(out_weights) +
一种更短且可扩展的方法是;
vars = tf.trainable_variables()
lossL2 = tf.add_n([ tf.nn.l2_loss(v) for v in vars ]) * 0.001
这基本上是所有可训练变量的 l2_loss 的总和。您还可以创建一个字典,在其中仅指定要添加到成本中的变量并使用上面的第二行。然后,您可以将 lossL2 与您的 softmax 交叉熵值相加,以计算您的总损失。
Edit :正如 Piotr Dabkowski 所提到的,上面的代码也会对偏差进行正则化 。这可以通过在第二行添加一个 if 语句来避免;
lossL2 = tf.add_n([ tf.nn.l2_loss(v) for v in vars
if 'bias' not in v.name ]) * 0.001
这可以用来排除其他变量。
事实上,我们通常不会对偏置项(截距)进行正则化。 所以,我选择:
loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=out_layer, labels=tf_train_labels)) +
0.01*tf.nn.l2_loss(hidden_weights) +
0.01*tf.nn.l2_loss(out_weights))
通过惩罚截距项,当截距被添加到 y 值时,它会导致改变 y 值,向截距添加常数 c。有或没有不会改变结果,但需要一些计算