Tensorflow map函数拆分数据集结构

Tensorflow map function split the dataset structure

我对 tensorflow 映射函数中的数据集结构有疑问。 这是我的数据的样子:

简单

`train_examples = tf.data.Dataset.from_tensor_slices(train_data)
[[0,1,2,3,4,5,...],
 [32,33,34,35,36,...]],

真实

print(train_data[0])
[[array([2,539, 400, 513, 398, 523, 485, 533, 568, 566, 402, 565, 491,
   570, 576, 539, 351, 538, 297, 539, 262, 564, 313, 581, 370, 589,
   421, 514, 314, 501, 370, 489, 420,3]), array([2, 534, 403, 507, 401, 519, 487, 531, 567, 562, 405, 544, 495,
   537, 588, 528, 354, 526, 300, 534, 259, 555, 315, 575, 370, 589,
   421, 499, 315, 489, 372, 483, 423,3])]]

我转换为管道的张量<TensorSliceDataset shapes: (2, 34), types: tf.int64>

train_examples 包含 2D 张量,[[source],[target]] 有 17k 行。

def make_batches(ds):
    return (
        ds
        .cache()
        .shuffle(BUFFER_SIZE)
        .batch(BATCH_SIZE)
        .map(lambda x_int,y_int: x_int,y_int, num_parallel_calls=tf.data.experimental.AUTOTUNE)
        .prefetch(tf.data.experimental.AUTOTUNE))

train_batches = make_batches(train_examples)

对于地图,我想要分别输出源和目标的数据结构。我尝试使用函数 map(prepare, num_parallel_calls=tf.data.experimental.AUTOTUNE)

def prepare(ds):
  srcs = tf.ragged.constant(ds.numpy().[0],tf.int64)
  trgs = tf.ragged.constant(ds.numpy().[1],tf.int64)

  srcs = srcs.to_tensor()
  trgs = trgs.to_tensor()
  return srcs,trgs

但是tensorflow不允许在map函数中进行eager execution。 如果我还遗漏了有关 Tensorflow 中 map 函数用法的其他信息,请告诉我。谢谢。

Tensorflow 版本 = 2.7

您可以尝试像这样拆分样本:

import tensorflow as tf
import numpy as np


data = [[np.array([2,539, 400, 513, 398, 523, 485, 533, 568, 566, 402, 565, 491,
   570, 576, 539, 351, 538, 297, 539, 262, 564, 313, 581, 370, 589,
   421, 514, 314, 501, 370, 489, 420,3]), np.array([2, 534, 403, 507, 401, 519, 487, 531, 567, 562, 405, 544, 495,
   537, 588, 528, 354, 526, 300, 534, 259, 555, 315, 575, 370, 589,
   421, 499, 315, 489, 372, 483, 423,3])]]

samples = 50
data = data * samples
ds = tf.data.Dataset.from_tensor_slices(data)

def prepare(x):
  srcs, trgs = tf.split(x, num_or_size_splits = 2, axis=1)
  return srcs,trgs

def make_batches(ds):
    return (
        ds
        .cache()
        .shuffle(50)
        .batch(10)
        .map(prepare, num_parallel_calls=tf.data.experimental.AUTOTUNE)
        .prefetch(tf.data.experimental.AUTOTUNE))

train_batches = make_batches(ds)
for x, y in train_batches.take(1):
  print(x.shape, y.shape)
(10, 1, 34) (10, 1, 34)