如何修改填充向量的 seq2seq 成本函数?
How to modify the seq2seq cost function for padded vectors?
Tensorflow在构建RNN层时通过参数'sequence_length'支持动态长度序列,其中模型不学习序列大小='sequence_length'后的序列即returns 零向量。
但是,如何修改 https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/seq2seq.py#L890 处的成本函数以遇到屏蔽序列,以便仅在实际序列而不是整个填充序列上计算成本和困惑度?
def sequence_loss_by_example(logits, targets, weights, average_across_timesteps=True, softmax_loss_function=None, name=None):
if len(targets) != len(logits) or len(weights) != len(logits):
raise ValueError("Lengths of logits, weights, and targets must be the same "
"%d, %d, %d." % (len(logits), len(weights), len(targets)))
with ops.op_scope(logits + targets + weights, name,
"sequence_loss_by_example"):
log_perp_list = []
for logit, target, weight in zip(logits, targets, weights):
if softmax_loss_function is None:
# TODO(irving,ebrevdo): This reshape is needed because
# sequence_loss_by_example is called with scalars sometimes, which
# violates our general scalar strictness policy.
target = array_ops.reshape(target, [-1])
crossent = nn_ops.sparse_softmax_cross_entropy_with_logits(
logit, target)
else:
crossent = softmax_loss_function(logit, target)
log_perp_list.append(crossent * weight)
log_perps = math_ops.add_n(log_perp_list)
if average_across_timesteps:
total_size = math_ops.add_n(weights)
total_size += 1e-12 # Just to avoid division by 0 for all-0 weights.
log_perps /= total_size
return log_perps
此功能已支持通过使用权重计算动态序列长度的成本。只要确保 "padding targets" 的权重为 0,这些步骤的交叉熵就会被推为 0:
log_perp_list.append(crossent * weight)
并且总大小也将仅反映非填充步骤:
total_size = math_ops.add_n(weights)
如果您用零填充,一种得出权重的方法如下:
weights = tf.sign(tf.abs(model.targets))
(请注意,您可能需要将其转换为与目标相同的类型)
Tensorflow在构建RNN层时通过参数'sequence_length'支持动态长度序列,其中模型不学习序列大小='sequence_length'后的序列即returns 零向量。
但是,如何修改 https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/seq2seq.py#L890 处的成本函数以遇到屏蔽序列,以便仅在实际序列而不是整个填充序列上计算成本和困惑度?
def sequence_loss_by_example(logits, targets, weights, average_across_timesteps=True, softmax_loss_function=None, name=None):
if len(targets) != len(logits) or len(weights) != len(logits):
raise ValueError("Lengths of logits, weights, and targets must be the same "
"%d, %d, %d." % (len(logits), len(weights), len(targets)))
with ops.op_scope(logits + targets + weights, name,
"sequence_loss_by_example"):
log_perp_list = []
for logit, target, weight in zip(logits, targets, weights):
if softmax_loss_function is None:
# TODO(irving,ebrevdo): This reshape is needed because
# sequence_loss_by_example is called with scalars sometimes, which
# violates our general scalar strictness policy.
target = array_ops.reshape(target, [-1])
crossent = nn_ops.sparse_softmax_cross_entropy_with_logits(
logit, target)
else:
crossent = softmax_loss_function(logit, target)
log_perp_list.append(crossent * weight)
log_perps = math_ops.add_n(log_perp_list)
if average_across_timesteps:
total_size = math_ops.add_n(weights)
total_size += 1e-12 # Just to avoid division by 0 for all-0 weights.
log_perps /= total_size
return log_perps
此功能已支持通过使用权重计算动态序列长度的成本。只要确保 "padding targets" 的权重为 0,这些步骤的交叉熵就会被推为 0:
log_perp_list.append(crossent * weight)
并且总大小也将仅反映非填充步骤:
total_size = math_ops.add_n(weights)
如果您用零填充,一种得出权重的方法如下:
weights = tf.sign(tf.abs(model.targets))
(请注意,您可能需要将其转换为与目标相同的类型)