无法通过 tf.data.Dataset.from_tensor_slices() 加载数据框列
Can't load dataframe columns by tf.data.Dataset.from_tensor_slices()
我有一个数据框,其中包含列 = id、文本、Media_location(这是图像文件夹的相对路径)。
现在,我正在尝试像这样加载文本列,Media_location:
features = df[['Text', 'Media_location']]
dataset = tf.data.Dataset.from_tensor_slices((features))
然后出现这个错误:
Exception has occurred: ValueError
Failed to convert a NumPy array to a Tensor (Unsupported object type float).
During handling of the above exception, another exception occurred:
File "D:\Final\MultiCNN_test.py", line 114, in process_text_image
dataset = tf.data.Dataset.from_tensor_slices((features))
我认为这个错误是因为数据框列无法转换为张量,但我不确定如何转换以消除错误。
如果列 Text
和 Media_location
具有相同的数据类型,您的代码将起作用:
import tensorflow as tf
import pandas as pd
df = pd.DataFrame(data={'Text': ['some text', 'some more text'],
'Media_location': ['/path/to/file1', '/path/to/file2']})
features = df[['Text', 'Media_location']]
dataset = tf.data.Dataset.from_tensor_slices((features))
for x in dataset:
print(x)
tf.Tensor([b'some text' b'/path/to/file1'], shape=(2,), dtype=string)
tf.Tensor([b'some more text' b'/path/to/file2'], shape=(2,), dtype=string)
但是,如果两者具有不同的数据类型,您将得到您的错误或类似的错误,因为张量不能具有混合数据类型。所以尝试这样的事情:
df = pd.DataFrame(data={'Text': [0.29, 0.58],
'Media_location': ['/path/to/file1', '/path/to/file2']})
dataset = tf.data.Dataset.from_tensor_slices((df['Text'], df['Media_location']))
for x in dataset:
print(x)
(<tf.Tensor: shape=(), dtype=float64, numpy=0.29>, <tf.Tensor: shape=(), dtype=string, numpy=b'/path/to/file1'>)
(<tf.Tensor: shape=(), dtype=float64, numpy=0.58>, <tf.Tensor: shape=(), dtype=string, numpy=b'/path/to/file2'>)
我有一个数据框,其中包含列 = id、文本、Media_location(这是图像文件夹的相对路径)。
现在,我正在尝试像这样加载文本列,Media_location:
features = df[['Text', 'Media_location']]
dataset = tf.data.Dataset.from_tensor_slices((features))
然后出现这个错误:
Exception has occurred: ValueError
Failed to convert a NumPy array to a Tensor (Unsupported object type float).
During handling of the above exception, another exception occurred:
File "D:\Final\MultiCNN_test.py", line 114, in process_text_image
dataset = tf.data.Dataset.from_tensor_slices((features))
我认为这个错误是因为数据框列无法转换为张量,但我不确定如何转换以消除错误。
如果列 Text
和 Media_location
具有相同的数据类型,您的代码将起作用:
import tensorflow as tf
import pandas as pd
df = pd.DataFrame(data={'Text': ['some text', 'some more text'],
'Media_location': ['/path/to/file1', '/path/to/file2']})
features = df[['Text', 'Media_location']]
dataset = tf.data.Dataset.from_tensor_slices((features))
for x in dataset:
print(x)
tf.Tensor([b'some text' b'/path/to/file1'], shape=(2,), dtype=string)
tf.Tensor([b'some more text' b'/path/to/file2'], shape=(2,), dtype=string)
但是,如果两者具有不同的数据类型,您将得到您的错误或类似的错误,因为张量不能具有混合数据类型。所以尝试这样的事情:
df = pd.DataFrame(data={'Text': [0.29, 0.58],
'Media_location': ['/path/to/file1', '/path/to/file2']})
dataset = tf.data.Dataset.from_tensor_slices((df['Text'], df['Media_location']))
for x in dataset:
print(x)
(<tf.Tensor: shape=(), dtype=float64, numpy=0.29>, <tf.Tensor: shape=(), dtype=string, numpy=b'/path/to/file1'>)
(<tf.Tensor: shape=(), dtype=float64, numpy=0.58>, <tf.Tensor: shape=(), dtype=string, numpy=b'/path/to/file2'>)