tensorflow数据集如何形成在线分布

Question

我正在尝试构建我的数据集的直方图在线，同时正在生成样本“x”，以便我可以使用此直方图来搅动分布的方向样本“y”。这是一个没有真正起作用的玩具示例：

import tensorflow as tf
dataset = tf.data.Dataset.random(seed=4).take(10).map(lambda x: x%10)
hs = tf.convert_to_tensor(np.zeros(10), tf.float32) # the histogram
dataset = dataset.map(lambda x : proc(x,hs))

proc 函数所在位置：

def proc(x,hs):  
  y = tf.math.argmin(input = hs)  
  hs = tf.tensor_scatter_nd_add(hs, [[x]], [1])  # hist[y]+=1
  return x,y

正如您所料，变量“hs”最终没有改变（该函数只是将一个新对象分配给变量 hs）。无论如何我可以做这个工作吗？（我看过数据集中的拒绝抽样，但我什至不想创建可能需要稍后丢弃的样本，我喜欢在线分发并相应地生成）。

更多信息：所以“x”的真实生成器没有生成均匀分布（与此示例不同）。因此，此直方图的目标是通过填充 x 的最低频率区间来帮助我生成“y”样本，以便最终 {x,y} 的分布最终模拟均匀分布。

Answer 1

如果 hs 应该是您数据集的一部分，那么您的代码工作正常：

import tensorflow as tf
import numpy as np

def proc(x, hs):  
  x = (x+1)%10
  hs = tf.tensor_scatter_nd_add(hs, [[x]], [1])  # hist[y]+=1
  return x, hs
dataset = tf.data.Dataset.random(seed=4).take(10).map(lambda x: x%10)
hs = tf.convert_to_tensor(np.zeros(10), tf.float32) 
dataset = dataset.map(lambda x : proc(x,hs))

for x, y in dataset:
  print(x, y)

tf.Tensor(6, shape=(), dtype=int64) tf.Tensor([0. 0. 0. 0. 0. 0. 1. 0. 0. 0.], shape=(10,), dtype=float32)
tf.Tensor(0, shape=(), dtype=int64) tf.Tensor([1. 0. 0. 0. 0. 0. 0. 0. 0. 0.], shape=(10,), dtype=float32)
tf.Tensor(9, shape=(), dtype=int64) tf.Tensor([0. 0. 0. 0. 0. 0. 0. 0. 0. 1.], shape=(10,), dtype=float32)
tf.Tensor(4, shape=(), dtype=int64) tf.Tensor([0. 0. 0. 0. 1. 0. 0. 0. 0. 0.], shape=(10,), dtype=float32)
tf.Tensor(6, shape=(), dtype=int64) tf.Tensor([0. 0. 0. 0. 0. 0. 1. 0. 0. 0.], shape=(10,), dtype=float32)
tf.Tensor(4, shape=(), dtype=int64) tf.Tensor([0. 0. 0. 0. 1. 0. 0. 0. 0. 0.], shape=(10,), dtype=float32)
tf.Tensor(4, shape=(), dtype=int64) tf.Tensor([0. 0. 0. 0. 1. 0. 0. 0. 0. 0.], shape=(10,), dtype=float32)
tf.Tensor(5, shape=(), dtype=int64) tf.Tensor([0. 0. 0. 0. 0. 1. 0. 0. 0. 0.], shape=(10,), dtype=float32)
tf.Tensor(6, shape=(), dtype=int64) tf.Tensor([0. 0. 0. 0. 0. 0. 1. 0. 0. 0.], shape=(10,), dtype=float32)
tf.Tensor(7, shape=(), dtype=int64) tf.Tensor([0. 0. 0. 0. 0. 0. 0. 1. 0. 0.], shape=(10,), dtype=float32)

如果你想 hs 作为一个单独的张量，你可以另外运行:

hs = tf.convert_to_tensor(list(dataset.map(lambda x, y: y)))
print(hs)

[[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]], shape=(10, 10), dtype=float32)

如果您只想根据 dataset 更新张量 hs，请尝试：

import tensorflow as tf
import numpy as np

def proc(ds, hs):
  for x in ds:
    x = (x+1)%10
    hs = tf.tensor_scatter_nd_add(hs, [[x]], [1])  # hist[y]+=1
  return hs
dataset = tf.data.Dataset.random(seed=4).take(10).map(lambda x: x%10)
hs = tf.convert_to_tensor(np.zeros(10), tf.float32) 
hs = proc(dataset, hs)
print(hs)

tf.Tensor([1. 0. 0. 0. 3. 1. 3. 1. 0. 1.], shape=(10,), dtype=float32)

更新 1:

import tensorflow as tf

def proc(x,hs):
  y = tf.math.argmin(input = hs)
  hs.assign(tf.tensor_scatter_nd_add(hs.value(), [[x]], [1]))  # hist[y]+=1
  return x, y

dataset = tf.data.Dataset.random(seed=4).take(10).map(lambda x: x%10)
hs = tf.Variable(np.zeros(10), tf.int64) # the histogram
dataset = dataset.map(lambda x : proc(x, hs))

tensorflow数据集如何形成在线分布

tensorflow dataset how to form online distribution

python

dataset

sampling

tensorflow

tensorflow-datasets