如何从 Tensorflow 中的多个对齐数据集中随机 select 数据？

Question

假设我们有 6 个文本文件，分为 2 组，每组包含 3 个文件，比如说，

第 1 组：1.a、1.b、1.c
第 2 组：2.a、2.b、2.c

给定一个固定阈值 rand 和 random 模块的 random()，我希望得到的是 3 个张量：

组 x：x_a、x_b、x_c

其中每个文件中的行数相同且对齐，x_a 的第 n 行将是：

第 1 步：'<nth line from 1.a>' if rand < random() else '<nth line from 2.a>'

x_b 和 x_c 的第 n 行也将是：

第 2 步：<'nth line from 1.b>' if '<nth row of x_a from 1.a>' else '<nth line from 2.b>'
第 3 步：<'nth line from 1.c>' if '<nth row of x_a from 1.a>' else '<nth line from 2.c>'（遵循第 2 步，但适用于 x_c）

这样 x_a、x_b 和 x_c 都对齐了。

我使用的工具是tf.data.TextLineDataset，请问如何随机选择并保持选择轨迹？谢谢！

Answer 1

========================我的解决方案=================== =====

我提供了一个轨迹文件来引导这3个文件。仍然欢迎其他解决方案！

a1 = tf.data.TextLineDataset(afile1).map(...)
b1 = tf.data.TextLineDataset(bfile1).map(...)
c1 = tf.data.TextLineDataset(cfile1).map(...)
...
index = tf.data.TextLineDataset(track_file).map(lambda line: tf.string_to_number(line, tf.int32))
As = tf.data.Dataset.zip((index, a1, a2))
Bs = tf.data.Dataset.zip((index, b1, b2))
...
ax = As.map(lambda i, l, r: tf.where(i > 0, l, r))
bx = As.map(lambda i, l, r: tf.where(i > 0, l, r))
cx = As.map(lambda i, l, r: tf.where(i > 0, l, r))
...

如何从 Tensorflow 中的多个对齐数据集中随机 select 数据？

How to randomly select data from multiple aligned dataset in Tensorflow?

python

tensorflow-datasets