如何制作用于在张量流中选择一些特殊索引的掩码？

Question

这是一个示例代码。我想使用 tf.scatter_nd 为新张量赋值。比如，updated_values = tf.scatter_nd(tf.expand_dims(indices_tf, -1), values_tf, tf.shape(values_tf))。 indices 张量中有一些重复的索引，这导致 updated_values 张量增加了麻烦。我只想将一个accroding分配给始终与索引张量具有相同形状的信息张量。代码描述了细节。

import numpy as np
import tensorflow as tf

info =    np.array([0, 3, 4, 5, 6, 2, 1])

indices = np.array([0, 1, 2, 1, 1, 1, 2])
values = np.array([7, 6, 4, 9, 2, 1, 10])

delta_tf = tf.convert_to_tensor(info, tf.int32)
indices_tf = tf.convert_to_tensor(indices, tf.int32)
values_tf = tf.convert_to_tensor(values, tf.float32)

updated_values = tf.scatter_nd(tf.expand_dims(indices_tf, -1), values_tf, tf.shape(values_tf))

sess = tf.Session()

updated_values_ = sess.run(updated_values)
print(updated_values_)

# The updated_values_ is [7. 18. 14.  0.  0.  0.  0.].
# I would like tf.scatter_nd to assign only one value to updated_values 
# at repetitive indices not adding them.
# 
# So I want to make a mask from indices according to info, 
# the rule is that when meeting repetitive index in indices, 
# the mask will compare the values in info, then reset the maximum value to 1, the others to 0.
#    info: [0, 3, 4, 5, 6, 2, 1]
# indices: [0, 1, 2, 1, 1, 1, 2]
#    mask: [1, 0, 1, 0, 1, 0, 0]
# 
# In this example, the mask will reset the position at 0, 3, 6 in info to 1, the others to 0.
# So the mask is [1, 0, 1, 0, 1, 0, 0].

mask = np.array([1, 0, 1, 0, 1, 0, 0]) # the desired mask
mask_tf = tf.convert_to_tensor(mask, tf.float32)
updated_valuess = tf.scatter_nd(tf.expand_dims(indices_tf, -1), values_tf * mask_tf, tf.shape(values_tf))

updated_valuess_ = sess.run(updated_valuess)
print(updated_valuess_) # This output [7. 2. 4. 0. 0. 0. 0.] is what I want.

如何生成这个掩码？

Answer 1

方法一

这应该可以，

vals = tf.constant([0, 3, 4, 5, 6, 2, 1], dtype=tf.int64)
ind = tf.constant([0, 1, 2, 1, 1, 1, 2], dtype=tf.int64)

un, _  = tf.unique(ind)
res = tf.reduce_sum(
    tf.map_fn(lambda x: tf.one_hot(tf.argmax(vals * tf.cast(tf.equal(ind, x), tf.int64)), depth=ind.shape[0], dtype=tf.int64), un), 
    axis=0)

编辑

方法二

map_fn 是一个缓慢的操作。因此，如果您使用该选项，您也应该预料到速度会很慢。除非您设置 parallel_iteration 参数并以 non-eager 模式执行它，否则它将按顺序运行执行。因此，下面的解决方案会更快。

在这里，我使用 RaggedTensor 并查找属于给定段的每个元素的最大值。但是，要使用 RaggedTensors，您必须对段 ID 进行排序，因此涉及排序和取消排序。

vals = tf.constant([0, 3, 4, 5, 6, 2, 1], dtype=tf.int64)
ind = tf.constant([0, 1, 2, 1, 1, 1, 2], dtype=tf.int64)

sort_args = tf.argsort(ind)
sort_vals = tf.gather(vals, sort_args)
sort_inds = tf.gather(ind, sort_args)

ragged = tf.RaggedTensor.from_value_rowids(
    values=sort_vals,
    value_rowids=sort_inds)
max_vals = tf.reshape(tf.reduce_max(ragged, axis=1),[-1,1])

res = tf.cast(tf.equal(ragged, max_vals).values, tf.int32)
res_unsort = tf.gather(res, sort_args)

结果（在 Colab GPU 上）

方法 1（无并行执行）：10 个循环，3 个循环中的最佳：每个循环 2.4 毫秒
方法 2：10 个循环，最好的 3 个循环：每个循环 490 微秒

您可以看到方法 2 更快。

如何制作用于在张量流中选择一些特殊索引的掩码？

How to make a mask for picking some special indices in tensorflow?

python

arrays

numpy

mask

tensorflow

方法一

方法二

结果（在 Colab GPU 上）