在 EagerTensor 中使用不同的数据类型

Question

使用 Tensorflow 2.0 alpha，当我尝试使用以下数据创建 tf.data.Dataset 时收到错误 ValueError: Can't convert Python sequence with mixed types to Tensor：

Inspect the complete dataset on Kaggle

显然，存在混合数据类型。 Sex是一个字符串，Age一个float/double，SibSp和Parch是一个整数等等。

我的 (Python 3) 代码 t运行sform this Pandas Dataframe into a tf.data.Dataset 基于Tensorflow 的教程 How to classify structured data，如下所示：

def df_to_dataset(dataframe, shuffle=True, batch_size=32):
  dataframe = dataframe.copy()

  # the 'Survived' column is the label (not shown in the image of the Dataframe but exists in the Dataframe)
  label = dataframe.pop('Survived')

  # create the dataset from the dataframe
  ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), label))

  # if shuffle == true, randomize the entries
  if shuffle:
    ds = ds.shuffle(buffer_size=len(dataframe))
  ds = ds.batch(batch_size)

  return ds

上面已经提到，这个函数在执行时会抛出错误ValueError: Can't convert Python sequence with mixed types to Tensor，例如：

train_ds = df_to_dataset(df_train, batch_size=32)

（而 df_train 是您可以在图像中看到的 pandas 数据框）

现在我想知道我是否遗漏了什么，因为 Tensorflow 的教程（如上所述）也使用了混合类型的数据框，但是我运行在尝试使用完全相同的示例时运行没有出错 df_to_dataset函数。

Answer 1

此错误是由于 NaN 值是特定列。用 dataframe['Name'].isnull().sum()) 检测它们并替换。

在 EagerTensor 中使用不同的数据类型

Using different data types in EagerTensor

python

tensorflow

tensorflow-datasets

tensorflow2.0