从 numpy 和 scipy.sparse 为 tensorflow 准备数据输入

Question

如何准备输入到张量流模型（比如 keras 顺序模型）的数据？

我知道如何使用 numpy 和 scipy 准备 x_train、y_train、x_test 和 y_test（最终 pandas、sklearn style) 其中 train/test 数据是用于训练神经模型的训练数据和测试数据， x/y 代表二维稀疏矩阵和一维 numpy 数组，表示与 x 数据中的原始数相同大小的整数标签。

我在 Dataset documentation 上苦苦挣扎，目前还没有太多见识...

到目前为止，我只能使用

之类的方法将 scipy.sparse 矩阵转换为 tensorflow.SparseTensor

import numpy as np
import tensorflow as tf
from scipy import sparse as sp

x = sp.csr_matrix( ... )
x = tf.SparseTensor(indices=np.vstack([*x.nonzero()]).T, 
                    values=x.data, 
                    dense_shape=x.shape)

我可以使用

之类的方法将 numpy 数组转换为 tensorflow.Tensor

import numpy as np
import tensorflow as tf

y = np.array( ... ) # 1D array of len == x.shape[0]
y = tf.constant(y)

如何将 x 和 y 对齐到单个数据集中，以便构建批处理、缓冲区...并从数据集实用程序中受益？
我应该使用 zip、from_tensor_slices 还是 tensorflow.data.Dataset 模块的任何其他方法？

x 和 y 的例子是

x = tf.SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4])
y = tf.constant(np.array(range(3)))

Answer 1

您应该能够使用 tf.data.Data.from_tensor_slices，因为您提到“y 是一维 numpy 数组，表示与 x 数据中的行数大小相同的整数标签”：

import tensorflow as tf

x = tf.SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4])
y = tf.constant(np.array(range(3)))

dataset = tf.data.Dataset.from_tensor_slices((x, y))

for x, y in dataset:
  print(x, y)

SparseTensor(indices=tf.Tensor([[0]], shape=(1, 1), dtype=int64), values=tf.Tensor([1], shape=(1,), dtype=int32), dense_shape=tf.Tensor([4], shape=(1,), dtype=int64)) tf.Tensor(0, shape=(), dtype=int64)
SparseTensor(indices=tf.Tensor([[2]], shape=(1, 1), dtype=int64), values=tf.Tensor([2], shape=(1,), dtype=int32), dense_shape=tf.Tensor([4], shape=(1,), dtype=int64)) tf.Tensor(1, shape=(), dtype=int64)
SparseTensor(indices=tf.Tensor([], shape=(0, 1), dtype=int64), values=tf.Tensor([], shape=(0,), dtype=int32), dense_shape=tf.Tensor([4], shape=(1,), dtype=int64)) tf.Tensor(2, shape=(), dtype=int64)

从 numpy 和 scipy.sparse 为 tensorflow 准备数据输入

Prepare data input for tensorflow from numpy and scipy.sparse

python

numpy

scipy

tensorflow

tensorflow-datasets