Tensorflow feature_column 期待与输入数据不同的形状
Tensorflow feature_column expecting a different shape than input data
我正在尝试实现一个 tensorflow Estimator
,并收到一个形状不匹配错误,我不知道如何调试。我想我可能误解了如何指定 tf.feature_column
的形状。我的意图是创建一个具有 6010 个输入的模型。如有任何建议,我们将不胜感激。
def train_input_fn():
with np.load(TRAIN_NN_FEATURES) as train:
train_features = train['features']
train_labels = train['labels']
train_dataset = tf.data.Dataset.from_tensor_slices(
({'all_features': train_features}, train_labels))
train_iterator = train_dataset.make_one_shot_iterator()
return train_iterator.get_next()
all_features = tf.feature_column.numeric_column(
'all_features',
shape=(6010,),
dtype=tf.float64
)
estimator = tf.estimator.DNNClassifier(
feature_columns=[all_features],
hidden_units=[1024, 512, 256]
)
estimator.train(input_fn=train_input_fn)
当我运行这个时,我得到以下错误:
InvalidArgumentError (see above for traceback): Input to reshape
is a tensor with 6010 values, but the requested shape has 36120100
[[Node: dnn/input_from_feature_columns/input_layer/all_features/Reshape =
Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]
(dnn/input_from_feature_columns/input_layer/all_features/ToFloat,
dnn/input_from_feature_columns/input_layer/all_features/Reshape/shape)]]
数据的形状符合我的预期,但 feature_column 似乎期待它的正方形。
>>> train_features.shape
(10737, 6010)
>>>train_labels.shape
(10737, 1)
>>> 36120100./6010
6010.0
我的理解是Dataset.from_tensor_slices
沿给定张量的轴0取片,这与错误消息相对应"Input to reshape is a tensor with 6010 values."但是为什么要求具有36120100个值的形状?
I'd still like to know why the above wasn't working, or how to debug though.
问题出在 train_iterator.get_next()
生成的张量大小上。如果未指定批量大小,则迭代器 returns:
({'all_features': <tf.Tensor 'IteratorGetNext:0' shape=(6010,) dtype=float64>},
<tf.Tensor 'IteratorGetNext:1' shape=(1,) dtype=float64>)
... 元组。如您所见,特征张量形状为 (6010,)
,DNNClassifier
解释为 batch_size=6010
(按照惯例,第一个维度是批量大小),它仍然期望 6010
特征。因此出现错误:它无法将 (6010,)
重塑为 (6010, 6010)
.
为了使其正常工作,您必须手动重塑此张量,或者通过调用以下方法简单地设置批量大小:
train_dataset = train_dataset.batch(16)
即使批量大小 1
也可以,因为它会强制 get_next
张量为:
({'all_features': <tf.Tensor 'IteratorGetNext:0' shape=(?, 6010) dtype=float64>},
<tf.Tensor 'IteratorGetNext:1' shape=(?, 1) dtype=float64>)
...但您显然希望将其设置得更大以提高效率。
我正在尝试实现一个 tensorflow Estimator
,并收到一个形状不匹配错误,我不知道如何调试。我想我可能误解了如何指定 tf.feature_column
的形状。我的意图是创建一个具有 6010 个输入的模型。如有任何建议,我们将不胜感激。
def train_input_fn():
with np.load(TRAIN_NN_FEATURES) as train:
train_features = train['features']
train_labels = train['labels']
train_dataset = tf.data.Dataset.from_tensor_slices(
({'all_features': train_features}, train_labels))
train_iterator = train_dataset.make_one_shot_iterator()
return train_iterator.get_next()
all_features = tf.feature_column.numeric_column(
'all_features',
shape=(6010,),
dtype=tf.float64
)
estimator = tf.estimator.DNNClassifier(
feature_columns=[all_features],
hidden_units=[1024, 512, 256]
)
estimator.train(input_fn=train_input_fn)
当我运行这个时,我得到以下错误:
InvalidArgumentError (see above for traceback): Input to reshape
is a tensor with 6010 values, but the requested shape has 36120100
[[Node: dnn/input_from_feature_columns/input_layer/all_features/Reshape =
Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]
(dnn/input_from_feature_columns/input_layer/all_features/ToFloat,
dnn/input_from_feature_columns/input_layer/all_features/Reshape/shape)]]
数据的形状符合我的预期,但 feature_column 似乎期待它的正方形。
>>> train_features.shape
(10737, 6010)
>>>train_labels.shape
(10737, 1)
>>> 36120100./6010
6010.0
我的理解是Dataset.from_tensor_slices
沿给定张量的轴0取片,这与错误消息相对应"Input to reshape is a tensor with 6010 values."但是为什么要求具有36120100个值的形状?
I'd still like to know why the above wasn't working, or how to debug though.
问题出在 train_iterator.get_next()
生成的张量大小上。如果未指定批量大小,则迭代器 returns:
({'all_features': <tf.Tensor 'IteratorGetNext:0' shape=(6010,) dtype=float64>},
<tf.Tensor 'IteratorGetNext:1' shape=(1,) dtype=float64>)
... 元组。如您所见,特征张量形状为 (6010,)
,DNNClassifier
解释为 batch_size=6010
(按照惯例,第一个维度是批量大小),它仍然期望 6010
特征。因此出现错误:它无法将 (6010,)
重塑为 (6010, 6010)
.
为了使其正常工作,您必须手动重塑此张量,或者通过调用以下方法简单地设置批量大小:
train_dataset = train_dataset.batch(16)
即使批量大小 1
也可以,因为它会强制 get_next
张量为:
({'all_features': <tf.Tensor 'IteratorGetNext:0' shape=(?, 6010) dtype=float64>},
<tf.Tensor 'IteratorGetNext:1' shape=(?, 1) dtype=float64>)
...但您显然希望将其设置得更大以提高效率。