有没有一种简单的方法来扩展现有的激活功能?我的自定义 softmax 函数 returns:一个操作具有用于梯度的“None”
Is there a simple way to extend an existing activation function? My custom softmax function returns: An operation has `None` for gradient
我想通过仅使用向量中的前 k 个值来尝试使 softmax 更快。
为此,我尝试为 tensorflow 实现自定义函数以在模型中使用:
def softmax_top_k(logits, k=10):
values, indices = tf.nn.top_k(logits, k, sorted=False)
softmax = tf.nn.softmax(values)
logits_shape = tf.shape(logits)
return_value = tf.sparse_to_dense(indices, logits_shape, softmax)
return_value = tf.convert_to_tensor(return_value, dtype=logits.dtype, name=logits.name)
return return_value
我正在使用时尚 mnist 来测试该尝试是否有效:
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# normalize the data
train_images = train_images / 255.0
test_images = test_images / 255.0
# split the training data into train and validate arrays (will be used later)
train_images, train_images_validate, train_labels, train_labels_validate = train_test_split(
train_images, train_labels, test_size=0.2, random_state=133742,
)
model = keras.models.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=softmax_top_k)
])
model.compile(
loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
model.fit(
train_images, train_labels,
epochs=10,
validation_data=(train_images_validate, train_labels_validate),
)
model_without_cnn.compile(
loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
model_without_cnn.fit(
train_images, train_labels,
epochs=10,
validation_data=(train_images_validate, train_labels_validate),
)
但是执行过程中出现错误:
ValueError: An operation has
Nonefor gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable).
我找到了 ,它解释了如何为 tensorflow 实现一个完全自定义的激活函数。但由于这使用并扩展了 softmax,我认为梯度应该仍然相同。
这是我使用 python 和 tensorflow 进行编码的第一周,因此我还没有对所有内部实现有一个很好的概述。
有没有更简单的方法将 softmax 扩展到一个新函数,而不是从头开始实现它?
提前致谢!
不使用稀疏张量来制作具有"all zeros except softmaxed top-K values"的张量,而是使用tf.scatter_nd
:
import tensorflow as tf
def softmax_top_k(logits, k=10):
values, indices = tf.nn.top_k(logits, k, sorted=False)
softmax = tf.nn.softmax(values)
logits_shape = tf.shape(logits)
# Assuming that logits is 2D
rows = tf.tile(tf.expand_dims(tf.range(logits_shape[0]), 1), [1, k])
scatter_idx = tf.stack([rows, indices], axis=-1)
return tf.scatter_nd(scatter_idx, softmax, logits_shape)
编辑:这是一个稍微复杂一点的版本,适用于具有任意维数的张量。不过,代码仍然要求在图形构造时已知维数。
import tensorflow as tf
def softmax_top_k(logits, k=10):
values, indices = tf.nn.top_k(logits, k, sorted=False)
softmax = tf.nn.softmax(values)
# Make nd indices
logits_shape = tf.shape(logits)
dims = [tf.range(logits_shape[i]) for i in range(logits_shape.shape.num_elements() - 1)]
grid = tf.meshgrid(*dims, tf.range(k), indexing='ij')
scatter_idx = tf.stack(grid[:-1] + [indices], axis=-1)
return tf.scatter_nd(scatter_idx, softmax, logits_shape)
我想通过仅使用向量中的前 k 个值来尝试使 softmax 更快。
为此,我尝试为 tensorflow 实现自定义函数以在模型中使用:
def softmax_top_k(logits, k=10):
values, indices = tf.nn.top_k(logits, k, sorted=False)
softmax = tf.nn.softmax(values)
logits_shape = tf.shape(logits)
return_value = tf.sparse_to_dense(indices, logits_shape, softmax)
return_value = tf.convert_to_tensor(return_value, dtype=logits.dtype, name=logits.name)
return return_value
我正在使用时尚 mnist 来测试该尝试是否有效:
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# normalize the data
train_images = train_images / 255.0
test_images = test_images / 255.0
# split the training data into train and validate arrays (will be used later)
train_images, train_images_validate, train_labels, train_labels_validate = train_test_split(
train_images, train_labels, test_size=0.2, random_state=133742,
)
model = keras.models.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=softmax_top_k)
])
model.compile(
loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
model.fit(
train_images, train_labels,
epochs=10,
validation_data=(train_images_validate, train_labels_validate),
)
model_without_cnn.compile(
loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
model_without_cnn.fit(
train_images, train_labels,
epochs=10,
validation_data=(train_images_validate, train_labels_validate),
)
但是执行过程中出现错误:
ValueError: An operation has
Nonefor gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable).
我找到了
这是我使用 python 和 tensorflow 进行编码的第一周,因此我还没有对所有内部实现有一个很好的概述。
有没有更简单的方法将 softmax 扩展到一个新函数,而不是从头开始实现它?
提前致谢!
不使用稀疏张量来制作具有"all zeros except softmaxed top-K values"的张量,而是使用tf.scatter_nd
:
import tensorflow as tf
def softmax_top_k(logits, k=10):
values, indices = tf.nn.top_k(logits, k, sorted=False)
softmax = tf.nn.softmax(values)
logits_shape = tf.shape(logits)
# Assuming that logits is 2D
rows = tf.tile(tf.expand_dims(tf.range(logits_shape[0]), 1), [1, k])
scatter_idx = tf.stack([rows, indices], axis=-1)
return tf.scatter_nd(scatter_idx, softmax, logits_shape)
编辑:这是一个稍微复杂一点的版本,适用于具有任意维数的张量。不过,代码仍然要求在图形构造时已知维数。
import tensorflow as tf
def softmax_top_k(logits, k=10):
values, indices = tf.nn.top_k(logits, k, sorted=False)
softmax = tf.nn.softmax(values)
# Make nd indices
logits_shape = tf.shape(logits)
dims = [tf.range(logits_shape[i]) for i in range(logits_shape.shape.num_elements() - 1)]
grid = tf.meshgrid(*dims, tf.range(k), indexing='ij')
scatter_idx = tf.stack(grid[:-1] + [indices], axis=-1)
return tf.scatter_nd(scatter_idx, softmax, logits_shape)