Tensorflow Dataset 操作等于 timeseries_dataset_from_array 可能吗?
Tensorflow Dataset operation equal to timeseries_dataset_from_array possible?
我想要更多地控制 TensorFlow 数据集生成。出于这个原因,我想反映 timeseries_dataset_from_array 的行为,但能够使用连续的 windows 或不重叠的 windows (无法使用 timeseries_dataset_from_array 设置 sequence_stride=0).
# df_with_inputs = (x, 19) df_with_labels = (x,1)
ds = tf.data.Dataset.from_tensor_slices((df_with_inputs.values, df_with_labels.values)).window(20, shift=1, stride=1, drop_remainder=True).batch(32)
等于:
ds = tf.keras.preprocessing.timeseries_dataset_from_array(df_with_inputs[df_with_inputs.columns], df_with_labels[df_with_labels.columns], sequence_length=window_size,sequence_stride=1,shuffle=False,batch_size=batch_size)
两者都创建了一个具有相同数量样本的 BatchDataset,但是使用手动方法的数据集的类型规范有些不同,例如,首先,给我:
<BatchDataset shapes: (DatasetSpec(TensorSpec(shape=(19,), dtype=tf.float32, name=None), TensorShape([None])), DatasetSpec(TensorSpec(shape=(1,), dtype=tf.float32, name=None), TensorShape([None]))), types: (DatasetSpec(TensorSpec(shape=(19,), dtype=tf.float32, name=None), TensorShape([None])), DatasetSpec(TensorSpec(shape=(1,), dtype=tf.float32, name=None), TensorShape([None])))>
最后一个给我的地方:
<BatchDataset shapes: ((None, None, 19), (None, 1)), types: (tf.float64, tf.int32)>
。但是两者都包含相同数量的元素,在我的例子中是 3063。请注意,stride 和 sequence_stride 在两种方法中具有不同的行为(对于相同的行为,您需要 shift=1)。此外,当我尝试将第一个馈送到我的 NN 时,我收到以下错误(其中 timeseries_dataset_from_array 的 ds 非常有效):
TypeError: Inputs to a layer should be tensors.
知道我在这里遗漏了什么吗?
我的模特:
input_shape = (window_size, num_features) #(20,19)
model = tf.keras.Sequential([
tf.keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', padding="same",
input_shape=input_shape), [....]])
相当于:
import tensorflow as tf
tf.random.set_seed(345)
samples = 30
df_with_inputs = tf.random.normal((samples, 2), dtype=tf.float32)
df_with_labels = tf.random.uniform((samples, 1), maxval=2, dtype=tf.int32)
batch_size = 2
window_size = 20
ds1 = tf.keras.preprocessing.timeseries_dataset_from_array(df_with_inputs, df_with_labels, sequence_length=window_size,sequence_stride=1,shuffle=False, batch_size=batch_size)
for x, y in ds1.take(1):
print(x, y)
tf.Tensor(
[[[-0.01898661 1.2348452 ]
[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]]
[[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]
[-0.14423785 0.95039433]]], shape=(2, 20, 2), dtype=float32) tf.Tensor(
[[1]
[1]], shape=(2, 1), dtype=int32)
使用 tf.data.Dataset.from_tensor_slices
会是这样的:
ds2 = tf.data.Dataset.from_tensor_slices((df_with_inputs, df_with_labels)).batch(batch_size)
inputs_only_ds = ds2.map(lambda x, y: x)
inputs_only_ds = inputs_only_ds.flat_map(tf.data.Dataset.from_tensor_slices).window(window_size, shift=1, stride=1, drop_remainder=True).flat_map(lambda x: x.batch(window_size)).batch(batch_size)
ds2 = tf.data.Dataset.zip((inputs_only_ds, ds2.map(lambda x, y: y)))
for x, y in ds2.take(1):
print(x, y)
tf.Tensor(
[[[-0.01898661 1.2348452 ]
[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]]
[[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]
[-0.14423785 0.95039433]]], shape=(2, 20, 2), dtype=float32) tf.Tensor(
[[1]
[1]], shape=(2, 1), dtype=int32)
请注意,为了更容易地应用滑动 windows,需要 flap_map
来展平张量。函数 flat_map(lambda x: x.batch(window_size))
在应用滑动 windows.
后简单地创建了一批扁平张量
使用行 inputs_only_ds = ds2.map(lambda x, y: x)
我只提取数据 (x) 而没有标签 (y) 到 运行 滑动 windows。之后,在 tf.data.Dataset.zip((inputs_only_ds, ds2.map(lambda x, y: y)))
中,我将数据集与滑动 windows 和标签 (y) 连接/压缩,得到最终结果 ds2
.
我想要更多地控制 TensorFlow 数据集生成。出于这个原因,我想反映 timeseries_dataset_from_array 的行为,但能够使用连续的 windows 或不重叠的 windows (无法使用 timeseries_dataset_from_array 设置 sequence_stride=0).
# df_with_inputs = (x, 19) df_with_labels = (x,1)
ds = tf.data.Dataset.from_tensor_slices((df_with_inputs.values, df_with_labels.values)).window(20, shift=1, stride=1, drop_remainder=True).batch(32)
等于:
ds = tf.keras.preprocessing.timeseries_dataset_from_array(df_with_inputs[df_with_inputs.columns], df_with_labels[df_with_labels.columns], sequence_length=window_size,sequence_stride=1,shuffle=False,batch_size=batch_size)
两者都创建了一个具有相同数量样本的 BatchDataset,但是使用手动方法的数据集的类型规范有些不同,例如,首先,给我:
<BatchDataset shapes: (DatasetSpec(TensorSpec(shape=(19,), dtype=tf.float32, name=None), TensorShape([None])), DatasetSpec(TensorSpec(shape=(1,), dtype=tf.float32, name=None), TensorShape([None]))), types: (DatasetSpec(TensorSpec(shape=(19,), dtype=tf.float32, name=None), TensorShape([None])), DatasetSpec(TensorSpec(shape=(1,), dtype=tf.float32, name=None), TensorShape([None])))>
最后一个给我的地方:
<BatchDataset shapes: ((None, None, 19), (None, 1)), types: (tf.float64, tf.int32)>
。但是两者都包含相同数量的元素,在我的例子中是 3063。请注意,stride 和 sequence_stride 在两种方法中具有不同的行为(对于相同的行为,您需要 shift=1)。此外,当我尝试将第一个馈送到我的 NN 时,我收到以下错误(其中 timeseries_dataset_from_array 的 ds 非常有效):
TypeError: Inputs to a layer should be tensors.
知道我在这里遗漏了什么吗?
我的模特:
input_shape = (window_size, num_features) #(20,19)
model = tf.keras.Sequential([
tf.keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', padding="same",
input_shape=input_shape), [....]])
相当于:
import tensorflow as tf
tf.random.set_seed(345)
samples = 30
df_with_inputs = tf.random.normal((samples, 2), dtype=tf.float32)
df_with_labels = tf.random.uniform((samples, 1), maxval=2, dtype=tf.int32)
batch_size = 2
window_size = 20
ds1 = tf.keras.preprocessing.timeseries_dataset_from_array(df_with_inputs, df_with_labels, sequence_length=window_size,sequence_stride=1,shuffle=False, batch_size=batch_size)
for x, y in ds1.take(1):
print(x, y)
tf.Tensor(
[[[-0.01898661 1.2348452 ]
[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]]
[[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]
[-0.14423785 0.95039433]]], shape=(2, 20, 2), dtype=float32) tf.Tensor(
[[1]
[1]], shape=(2, 1), dtype=int32)
使用 tf.data.Dataset.from_tensor_slices
会是这样的:
ds2 = tf.data.Dataset.from_tensor_slices((df_with_inputs, df_with_labels)).batch(batch_size)
inputs_only_ds = ds2.map(lambda x, y: x)
inputs_only_ds = inputs_only_ds.flat_map(tf.data.Dataset.from_tensor_slices).window(window_size, shift=1, stride=1, drop_remainder=True).flat_map(lambda x: x.batch(window_size)).batch(batch_size)
ds2 = tf.data.Dataset.zip((inputs_only_ds, ds2.map(lambda x, y: y)))
for x, y in ds2.take(1):
print(x, y)
tf.Tensor(
[[[-0.01898661 1.2348452 ]
[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]]
[[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]
[-0.14423785 0.95039433]]], shape=(2, 20, 2), dtype=float32) tf.Tensor(
[[1]
[1]], shape=(2, 1), dtype=int32)
请注意,为了更容易地应用滑动 windows,需要 flap_map
来展平张量。函数 flat_map(lambda x: x.batch(window_size))
在应用滑动 windows.
使用行 inputs_only_ds = ds2.map(lambda x, y: x)
我只提取数据 (x) 而没有标签 (y) 到 运行 滑动 windows。之后,在 tf.data.Dataset.zip((inputs_only_ds, ds2.map(lambda x, y: y)))
中,我将数据集与滑动 windows 和标签 (y) 连接/压缩,得到最终结果 ds2
.