如何组合两个张量，使它们在一个数据集中？

Question

我正在使用来自 TensorFlow API 的 Titanic 数据集。

我不知道如何使特征张量模型友好。

这是我得到的最好的，但一次只针对一个张量。我该怎么做才能处理特征项中的所有张量？

import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.optimizers import Adam
    
data = tfds.load("titanic",split='train', as_supervised=True).map(lambda x,y: (x,y)).prefetch(1)
    
for i in data.batch(1309):
    xx1 = i[0]['age']
    xx2 = i[0]['fare']
    yyy = tf.convert_to_tensor(tf.one_hot(i[1],2))

model = tf.keras.models.Sequential([tf.keras.layers.Dense(1),
tf.keras.layers.Dense(13, activation='relu'),
tf.keras.layers.Dense(2, activation='softmax')])

model.compile(
  optimizer=Adam(learning_rate=0.01), 
  loss='categorical_crossentropy', 
  metrics=['accuracy']
)

model.fit(xx1,yyy,epochs=30)

如何连接 age 和 fare 张量，使它们位于一个数据集中？

我试过concat和stack都没有用。

Answer 1

这应该可以通过使用 tf.stack 来完成。由于输入已经在使用数据集 API，我重构了一些代码以利用数据集功能将输入格式映射到您描述的目标格式。为方便起见，这里有一个带有示例的 colab 笔记本：https://colab.research.google.com/drive/1dHNe9rYaJSgqbj_QtQ1aJL_7WgKnLKsU?usp=sharing

# Nothing novel here
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.optimizers import Adam

data = tfds.load("titanic",split='train', as_supervised=True).map(lambda x,y: (x,y)).prefetch(1)

预期数据重组的基本演示

使用 tf.stack

从数据集中取出 1 个项目并将其转换为包含两个目标数据点的张量

for item in data.take(1):
  age = item[0]['age']
  fare = item[0]['fare']
  output = tf.stack([age, fare], axis=0)
  print(output)

输出：tf.Tensor([30. 13.], shape=(2,), dtype=float32)

在输出中，我们可以看到一个包含两个值的张量，正如预期的那样。

用作 TensorFlow 数据集

可以直接提供Tensorflow数据集进行训练，我们可以轻松创建一个函数，将输入数据格式映射到问题中描述的目标格式。下面的函数将使用上面的示例代码完成此操作。

# Input data and associated label
def transform_data(item, label):

  # Extract values
  age = item['age']
  fare = item['fare']

  # Create output tensor
  output = tf.stack([age, fare], axis=0)
  return output, label

# Create a training dataset from the base dataset - for each batch map the input format to the goal format by passing the mapping function 
train_dataset = data.map(transform_data).batch(1200)

# Model - I made some minor changes to get it to run cleaner
model = tf.keras.models.Sequential([
  tf.keras.layers.Dense(2),
  tf.keras.layers.Dense(13, activation='relu'),
  # As we have only two labels, this is really a binary problem, so I've created a single output neuron activated by sigmoid
  tf.keras.layers.Dense(1,activation='sigmoid')
])


# Compiled with binary_crossentropy to complement the binary classification
model.compile(optimizer=Adam(learning_rate=0.01),loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_dataset,epochs=30)

输出：

Epoch 1/30
2/2 [==============================] - 0s 16ms/step - loss: 11.7881 - accuracy: 0.4385
Epoch 2/30
2/2 [==============================] - 0s 7ms/step - loss: 10.2350 - accuracy: 0.4270
...

如何组合两个张量，使它们在一个数据集中？

How can combine two tensors so they are in one dataset?

tensorflow

tensorflow-datasets

iris-dataset

tensorflow2.0

预期数据重组的基本演示

用作 TensorFlow 数据集