TensorFlow Keras CuDNNGRU 到 GRU 的转换
TensorFlow Keras CuDNNGRU to GRU conversion
我有一个在 TensorFlow 1.14 中构建的经过训练的模型,使用(现已弃用)tf.keras.layers.CuDNNGRU
layer (available in TensorFlow 2.0 in tf.compat.v1
), and I am trying to port the old layer's weights into a new TensorFlow 2.0 model built using tf.keras.layers.GRU
获得等效模型。
这样做的一个动机是能够在 CPU 上进行推理(tf.compat.v1.keras.layers.CuDNNGRU
层仅在 GPU 上运行)。另一个动机是面向未来的模型。
问题
如何将经过训练的 tf.contrib.v1.keras.layers.CuDNNGRU
层转换为等效的 tf.keras.layers.GRU
层?
独立 Keras tensorflow.python.keras.saving.hdf5_format
appears to do the trick. The function performs the more general task of converting weights between CuDNNGRU
/GRU
and CuDNNLSTM
/LSTM
formats, so it is useful beyond just my use case. The function appears to have originated in this pull request 中的以下私有辅助函数。
import numpy as np
def _convert_rnn_weights(layer, weights):
"""Converts weights for RNN layers between native and CuDNN format.
Input kernels for each gate are transposed and converted between Fortran
and C layout, recurrent kernels are transposed. For LSTM biases are summed/
split in half, for GRU biases are reshaped.
Weights can be converted in both directions between `LSTM` and`CuDNNSLTM`
and between `CuDNNGRU` and `GRU(reset_after=True)`. Default `GRU` is not
compatible with `CuDNNGRU`.
For missing biases in `LSTM`/`GRU` (`use_bias=False`) no conversion is made.
Arguments:
layer: Target layer instance.
weights: List of source weights values (input kernels, recurrent
kernels, [biases]) (Numpy arrays).
Returns:
A list of converted weights values (Numpy arrays).
Raises:
ValueError: for incompatible GRU layer/weights or incompatible biases
"""
def transform_kernels(kernels, func, n_gates):
"""Transforms kernel for each gate separately using given function.
Arguments:
kernels: Stacked array of kernels for individual gates.
func: Function applied to kernel of each gate.
n_gates: Number of gates (4 for LSTM, 3 for GRU).
Returns:
Stacked array of transformed kernels.
"""
return np.hstack([func(k) for k in np.hsplit(kernels, n_gates)])
def transpose_input(from_cudnn):
"""Makes a function that transforms input kernels from/to CuDNN format.
It keeps the shape, but changes between the layout (Fortran/C). Eg.:
```
Keras CuDNN
[[0, 1, 2], <---> [[0, 2, 4],
[3, 4, 5]] [1, 3, 5]]
```
It can be passed to `transform_kernels()`.
Arguments:
from_cudnn: `True` if source weights are in CuDNN format, `False`
if they're in plain Keras format.
Returns:
Function that converts input kernel to the other format.
"""
order = 'F' if from_cudnn else 'C'
def transform(kernel):
return kernel.T.reshape(kernel.shape, order=order)
return transform
target_class = layer.__class__.__name__
# convert the weights between CuDNNLSTM and LSTM
if target_class in ['LSTM', 'CuDNNLSTM'] and len(weights) == 3:
# determine if we're loading a CuDNNLSTM layer
# from the number of bias weights:
# CuDNNLSTM has (units * 8) weights; while LSTM has (units * 4)
# if there's no bias weight in the file, skip this conversion
units = weights[1].shape[0]
bias_shape = weights[2].shape
n_gates = 4
if bias_shape == (2 * units * n_gates,):
source = 'CuDNNLSTM'
elif bias_shape == (units * n_gates,):
source = 'LSTM'
else:
raise ValueError('Invalid bias shape: ' + str(bias_shape))
def convert_lstm_weights(weights, from_cudnn=True):
"""Converts the weights between CuDNNLSTM and LSTM.
Arguments:
weights: Original weights.
from_cudnn: Indicates whether original weights are from CuDNN layer.
Returns:
Updated weights compatible with LSTM.
"""
# Transpose (and reshape) input and recurrent kernels
kernels = transform_kernels(weights[0], transpose_input(from_cudnn),
n_gates)
recurrent_kernels = transform_kernels(weights[1], lambda k: k.T, n_gates)
if from_cudnn:
# merge input and recurrent biases into a single set
biases = np.sum(np.split(weights[2], 2, axis=0), axis=0)
else:
# Split single set of biases evenly to two sets. The way of
# splitting doesn't matter as long as the two sets sum is kept.
biases = np.tile(0.5 * weights[2], 2)
return [kernels, recurrent_kernels, biases]
if source != target_class:
weights = convert_lstm_weights(weights, from_cudnn=source == 'CuDNNLSTM')
# convert the weights between CuDNNGRU and GRU(reset_after=True)
if target_class in ['GRU', 'CuDNNGRU'] and len(weights) == 3:
# We can determine the source of the weights from the shape of the bias.
# If there is no bias we skip the conversion since
# CuDNNGRU always has biases.
units = weights[1].shape[0]
bias_shape = weights[2].shape
n_gates = 3
def convert_gru_weights(weights, from_cudnn=True):
"""Converts the weights between CuDNNGRU and GRU.
Arguments:
weights: Original weights.
from_cudnn: Indicates whether original weights are from CuDNN layer.
Returns:
Updated weights compatible with GRU.
"""
kernels = transform_kernels(weights[0], transpose_input(from_cudnn),
n_gates)
recurrent_kernels = transform_kernels(weights[1], lambda k: k.T, n_gates)
biases = np.array(weights[2]).reshape((2, -1) if from_cudnn else -1)
return [kernels, recurrent_kernels, biases]
if bias_shape == (2 * units * n_gates,):
source = 'CuDNNGRU'
elif bias_shape == (2, units * n_gates):
source = 'GRU(reset_after=True)'
elif bias_shape == (units * n_gates,):
source = 'GRU(reset_after=False)'
else:
raise ValueError('Invalid bias shape: ' + str(bias_shape))
if target_class == 'CuDNNGRU':
target = 'CuDNNGRU'
elif layer.reset_after:
target = 'GRU(reset_after=True)'
else:
target = 'GRU(reset_after=False)'
# only convert between different types
if source != target:
types = (source, target)
if 'GRU(reset_after=False)' in types:
raise ValueError('%s is not compatible with %s' % types)
if source == 'CuDNNGRU':
weights = convert_gru_weights(weights, from_cudnn=True)
elif source == 'GRU(reset_after=True)':
weights = convert_gru_weights(weights, from_cudnn=False)
return weights
对于我的用例(将 CuDNNGRU
权重放入 GRU
),使用此函数的解决方案如下:
# cudnn_gru and gru are built CuDNNGRU and GRU layers, respectively
kernel, recurrent_kernel, bias = _convert_rnn_weights(
layer=gru,
weights=[
cudnn_gru.kernel.numpy(),
cudnn_gru.recurrent_kernel.numpy(),
cudnn_gru.bias.numpy(),
],
)
gru.cell.kernel.assign(kernel)
gru.cell.recurrent_kernel.assign(recurrent_kernel)
gru.cell.bias.assign(bias)
请注意,要使用 tf.keras.layers.GRU
的 cuDNN 兼容实现,必须 use a specific combination of parameters(特别是 use_bias=True
)。
我知道这个线程有点旧,但我可以添加如何在 Keras/TF 2.6 中将 CuDNNGRUs/CuDNNLSTMs 转换为 GRUs/LSTMs(接受的答案对我不起作用,因为 gru.cell 的属性似乎已更改)。
背景:我想使用 GPU 训练 CuDNNGRU(比在 GPU 上训练标准 GRU 更快)并将其转换为标准 GRU 以进行 CPU 推理。
这个解决方案来自一个名叫 bzamecnik 的 GitHub 人:
创建包含 CuDNNGRU gru_cudnn
(或 CuDNNLSTM)的模型,训练它并保存它的权重:
gru_cudnn = CuDNNGRU(n_units)
model = ... make model with gru_cudnn ...
model.fit(...)
model.save_weights('weights_cudnn.h5')
使用标准 GRU gru
(或 LSTM)代替 CuDNNGRU(或 CuDNNLSTM)创建具有相同架构的模型,并从 1:
加载保存的 CuDNN 权重
gru = GRU(n_units, reset_after=True, recurrent_activation='sigmoid')
model = ... make model with gru ...
model.load_weights('weights_cudnn.h5')
我希望这对以后偶然发现这个话题的人有所帮助。
我有一个在 TensorFlow 1.14 中构建的经过训练的模型,使用(现已弃用)tf.keras.layers.CuDNNGRU
layer (available in TensorFlow 2.0 in tf.compat.v1
), and I am trying to port the old layer's weights into a new TensorFlow 2.0 model built using tf.keras.layers.GRU
获得等效模型。
这样做的一个动机是能够在 CPU 上进行推理(tf.compat.v1.keras.layers.CuDNNGRU
层仅在 GPU 上运行)。另一个动机是面向未来的模型。
问题
如何将经过训练的 tf.contrib.v1.keras.layers.CuDNNGRU
层转换为等效的 tf.keras.layers.GRU
层?
独立 Keras tensorflow.python.keras.saving.hdf5_format
appears to do the trick. The function performs the more general task of converting weights between CuDNNGRU
/GRU
and CuDNNLSTM
/LSTM
formats, so it is useful beyond just my use case. The function appears to have originated in this pull request 中的以下私有辅助函数。
import numpy as np
def _convert_rnn_weights(layer, weights):
"""Converts weights for RNN layers between native and CuDNN format.
Input kernels for each gate are transposed and converted between Fortran
and C layout, recurrent kernels are transposed. For LSTM biases are summed/
split in half, for GRU biases are reshaped.
Weights can be converted in both directions between `LSTM` and`CuDNNSLTM`
and between `CuDNNGRU` and `GRU(reset_after=True)`. Default `GRU` is not
compatible with `CuDNNGRU`.
For missing biases in `LSTM`/`GRU` (`use_bias=False`) no conversion is made.
Arguments:
layer: Target layer instance.
weights: List of source weights values (input kernels, recurrent
kernels, [biases]) (Numpy arrays).
Returns:
A list of converted weights values (Numpy arrays).
Raises:
ValueError: for incompatible GRU layer/weights or incompatible biases
"""
def transform_kernels(kernels, func, n_gates):
"""Transforms kernel for each gate separately using given function.
Arguments:
kernels: Stacked array of kernels for individual gates.
func: Function applied to kernel of each gate.
n_gates: Number of gates (4 for LSTM, 3 for GRU).
Returns:
Stacked array of transformed kernels.
"""
return np.hstack([func(k) for k in np.hsplit(kernels, n_gates)])
def transpose_input(from_cudnn):
"""Makes a function that transforms input kernels from/to CuDNN format.
It keeps the shape, but changes between the layout (Fortran/C). Eg.:
```
Keras CuDNN
[[0, 1, 2], <---> [[0, 2, 4],
[3, 4, 5]] [1, 3, 5]]
```
It can be passed to `transform_kernels()`.
Arguments:
from_cudnn: `True` if source weights are in CuDNN format, `False`
if they're in plain Keras format.
Returns:
Function that converts input kernel to the other format.
"""
order = 'F' if from_cudnn else 'C'
def transform(kernel):
return kernel.T.reshape(kernel.shape, order=order)
return transform
target_class = layer.__class__.__name__
# convert the weights between CuDNNLSTM and LSTM
if target_class in ['LSTM', 'CuDNNLSTM'] and len(weights) == 3:
# determine if we're loading a CuDNNLSTM layer
# from the number of bias weights:
# CuDNNLSTM has (units * 8) weights; while LSTM has (units * 4)
# if there's no bias weight in the file, skip this conversion
units = weights[1].shape[0]
bias_shape = weights[2].shape
n_gates = 4
if bias_shape == (2 * units * n_gates,):
source = 'CuDNNLSTM'
elif bias_shape == (units * n_gates,):
source = 'LSTM'
else:
raise ValueError('Invalid bias shape: ' + str(bias_shape))
def convert_lstm_weights(weights, from_cudnn=True):
"""Converts the weights between CuDNNLSTM and LSTM.
Arguments:
weights: Original weights.
from_cudnn: Indicates whether original weights are from CuDNN layer.
Returns:
Updated weights compatible with LSTM.
"""
# Transpose (and reshape) input and recurrent kernels
kernels = transform_kernels(weights[0], transpose_input(from_cudnn),
n_gates)
recurrent_kernels = transform_kernels(weights[1], lambda k: k.T, n_gates)
if from_cudnn:
# merge input and recurrent biases into a single set
biases = np.sum(np.split(weights[2], 2, axis=0), axis=0)
else:
# Split single set of biases evenly to two sets. The way of
# splitting doesn't matter as long as the two sets sum is kept.
biases = np.tile(0.5 * weights[2], 2)
return [kernels, recurrent_kernels, biases]
if source != target_class:
weights = convert_lstm_weights(weights, from_cudnn=source == 'CuDNNLSTM')
# convert the weights between CuDNNGRU and GRU(reset_after=True)
if target_class in ['GRU', 'CuDNNGRU'] and len(weights) == 3:
# We can determine the source of the weights from the shape of the bias.
# If there is no bias we skip the conversion since
# CuDNNGRU always has biases.
units = weights[1].shape[0]
bias_shape = weights[2].shape
n_gates = 3
def convert_gru_weights(weights, from_cudnn=True):
"""Converts the weights between CuDNNGRU and GRU.
Arguments:
weights: Original weights.
from_cudnn: Indicates whether original weights are from CuDNN layer.
Returns:
Updated weights compatible with GRU.
"""
kernels = transform_kernels(weights[0], transpose_input(from_cudnn),
n_gates)
recurrent_kernels = transform_kernels(weights[1], lambda k: k.T, n_gates)
biases = np.array(weights[2]).reshape((2, -1) if from_cudnn else -1)
return [kernels, recurrent_kernels, biases]
if bias_shape == (2 * units * n_gates,):
source = 'CuDNNGRU'
elif bias_shape == (2, units * n_gates):
source = 'GRU(reset_after=True)'
elif bias_shape == (units * n_gates,):
source = 'GRU(reset_after=False)'
else:
raise ValueError('Invalid bias shape: ' + str(bias_shape))
if target_class == 'CuDNNGRU':
target = 'CuDNNGRU'
elif layer.reset_after:
target = 'GRU(reset_after=True)'
else:
target = 'GRU(reset_after=False)'
# only convert between different types
if source != target:
types = (source, target)
if 'GRU(reset_after=False)' in types:
raise ValueError('%s is not compatible with %s' % types)
if source == 'CuDNNGRU':
weights = convert_gru_weights(weights, from_cudnn=True)
elif source == 'GRU(reset_after=True)':
weights = convert_gru_weights(weights, from_cudnn=False)
return weights
对于我的用例(将 CuDNNGRU
权重放入 GRU
),使用此函数的解决方案如下:
# cudnn_gru and gru are built CuDNNGRU and GRU layers, respectively
kernel, recurrent_kernel, bias = _convert_rnn_weights(
layer=gru,
weights=[
cudnn_gru.kernel.numpy(),
cudnn_gru.recurrent_kernel.numpy(),
cudnn_gru.bias.numpy(),
],
)
gru.cell.kernel.assign(kernel)
gru.cell.recurrent_kernel.assign(recurrent_kernel)
gru.cell.bias.assign(bias)
请注意,要使用 tf.keras.layers.GRU
的 cuDNN 兼容实现,必须 use a specific combination of parameters(特别是 use_bias=True
)。
我知道这个线程有点旧,但我可以添加如何在 Keras/TF 2.6 中将 CuDNNGRUs/CuDNNLSTMs 转换为 GRUs/LSTMs(接受的答案对我不起作用,因为 gru.cell 的属性似乎已更改)。
背景:我想使用 GPU 训练 CuDNNGRU(比在 GPU 上训练标准 GRU 更快)并将其转换为标准 GRU 以进行 CPU 推理。
这个解决方案来自一个名叫 bzamecnik 的 GitHub 人:
创建包含 CuDNNGRU
gru_cudnn
(或 CuDNNLSTM)的模型,训练它并保存它的权重:gru_cudnn = CuDNNGRU(n_units) model = ... make model with gru_cudnn ... model.fit(...) model.save_weights('weights_cudnn.h5')
使用标准 GRU
加载保存的 CuDNN 权重gru
(或 LSTM)代替 CuDNNGRU(或 CuDNNLSTM)创建具有相同架构的模型,并从 1:gru = GRU(n_units, reset_after=True, recurrent_activation='sigmoid') model = ... make model with gru ... model.load_weights('weights_cudnn.h5')
我希望这对以后偶然发现这个话题的人有所帮助。