Tensorflow：从多项式分布生成样本 [Space 有效的方法？]

Question

我有一个简短的问题。如何从 TensorFlow 中的多项式分布中对 {0, 1} 中的值进行采样？实际上我想要一个函数来完成 numpy.multinomial 的功能。

例如，假设我有一个计数向量和一个概率向量，如下所示：

counts = [5, 4, 3] # D in my code
probs = [0.1, 0.2, 0.3, 0.1, 0.2, 0.1] # v in my code

然后我想 return 一个大小为 (len(counts), len(probs)) = (3, 6) 的矩阵，其每行的总和 = 计数。

我查看了 TensorFlow 代码，找到了一种方法来做我想做的事情。这是我的一段代码：

import tensorflow.contrib.distributions as ds

def multinomial_sampling(D, v):
    dist = ds.Multinomial(total_count=D, probs=v)
    return  tf.reshape(tf.reduce_sum(dist._sample_n(1), 0 , False), [-1, v.shape[1]])

注意：我可以只输入 tf.expand_dims 而不是 tf.reshape

问题是这样做 space 效率不高，当我的矩阵足够大时，TensorFlow 只会对我大喊我没有足够的内存，因为他正在尝试创建一个大小为 [ 1, 185929, 3390] 其中 3390 是我的概率向量的长度。

所以我想自己实现多项式抽样，但我没有知道该怎么做，我认为我的想法不够有效（就时间复杂度而言）。这是我的骨架：

probsn = np.random.uniform(size=20)
probsn /= sum(probsn)

counts = tf.Variable([20, 12, 56, 3])
probs = tf.Variable(tf.convert_to_tensor(probsn))

cprobs = tf.cumsum(probs)

out = tf.zeros([tf.shape(counts)[0], tf.shape(probs)[0]])
for i in counts.shape[0]:
    count = tf.gather(counts, i) # get each count
    sample = tf.gather(out, i) # get each row of out

   for j in range(count): # problem here count is a Tensor and not a int
       rdn_number = tf.random_uniform(1)
       for k, prob in enumerate(range(cprobs)): # problem doesn't work in TF
           if  tf.less(rdn_number, prob): 
               tf.scatter_add(out, [i, k], 1)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    r = sess.run(out)
    print(r)

这是一个非常幼稚的算法。我认为可能有更好的方法来减少时间复杂度（有一种范围的字典？它映射到一系列浮点值行中的特定索引？不确定这样的事情是否可能，但它实际上会避免我迭代以找到我行中的索引...).

另外，这个实现并不像代码中提到的那样工作，因为我正在迭代的 number 实际上是张量。

有人在 TensorFlow 中巧妙地实现了多项式采样吗？

Answer 1

好吧，显然我的问题不是问题，因为我不应该有这么大的数字 (185929)。所以我编辑了一些其他的代码。为了完整起见，如果你想采样一个非常大的数字并且你想使用 sample()，你就不能这样做：

import tensorflow.contrib.distributions as ds

def multinomial_sampling(D, v):
    dist = ds.Multinomial(total_count=D, probs=v)
    return  tf.reshape(dist.sample(), [-1, v.shape[1]])

如果您的计算机内存不足。

注意：我将我的张量重塑为相同的形状，这样 TensorFlow 就不会当我在 while 循环中使用 multinomial_sampling 函数的输出时对我大喊大叫。没有 tf.reshape，在 tf.while_loop 中，Tensorflow 崩溃说我需要提供 shape_invariants.

所以你需要真正地分批进行。这个想法是在 while 循环中对特定批次（如 1000）进行采样，并减少每次迭代的计数。这是我制作的一段代码：

probsn = np.random.uniform(size=30) 
probsn /= sum(probsn) # vector of probability of size 30 (sum of the vector = 1)

u = np.random.randint(2000, 3500, size=100) # define number of counts (vector of size 100 with int in 2000, 3500)
print(u) # should be the same as the output of print(np.sum(res, 1)) of the tf.Session()

counts = tf.Variable(u, dtype=tf.float32)
probs = tf.Variable(tf.convert_to_tensor(probsn.astype(np.float32)))

import tensorflow.contrib.distributions as ds

dist = ds.Multinomial(total_count=counts, probs=probs)

out = dist.sample()
samples = tf.zeros((tf.shape(counts)[0], tf.shape(probs)[0]))

def batch_multinomial(counts, probs, samples):
    batch_count = tf.minimum(1000., counts) # use a batch of 1000
    dist = ds.Multinomial(total_count=batch_count, probs=probs)
    samples += dist.sample()

    return counts - batch_count, probs, samples

_, _ , samples = tf.while_loop(lambda counts, *args: tf.equal(tf.reduce_all(tf.less(counts, 0.1)), False) , batch_multinomial, [counts, probs, samples])

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    res = sess.run(samples)
    print(res)
    print(np.sum(res, 1))

Tensorflow：从多项式分布生成样本 [Space 有效的方法？]

Tensorflow: generate samples from a Multinomial distribution [Space efficient way?]

sampling

memory-efficient

multinomial

tensorflow