Tensorflow feature_column 期待与输入数据不同的形状

Tensorflow feature_column expecting a different shape than input data

我正在尝试实现一个 tensorflow Estimator,并收到一个形状不匹配错误,我不知道如何调试。我想我可能误解了如何指定 tf.feature_column 的形状。我的意图是创建一个具有 6010 个输入的模型。如有任何建议,我们将不胜感激。

def train_input_fn():
    with np.load(TRAIN_NN_FEATURES) as train:
        train_features = train['features']                                                                   
        train_labels = train['labels']                                                       
    train_dataset = tf.data.Dataset.from_tensor_slices(
            ({'all_features': train_features}, train_labels))     
    train_iterator = train_dataset.make_one_shot_iterator()                                                  
    return train_iterator.get_next()     

all_features = tf.feature_column.numeric_column(
    'all_features', 
    shape=(6010,), 
    dtype=tf.float64
) 

estimator = tf.estimator.DNNClassifier( 
    feature_columns=[all_features],
    hidden_units=[1024, 512, 256]
)

estimator.train(input_fn=train_input_fn)

当我运行这个时,我得到以下错误:

InvalidArgumentError (see above for traceback): Input to reshape 
is a tensor with 6010 values, but the requested shape has 36120100

[[Node: dnn/input_from_feature_columns/input_layer/all_features/Reshape = 
Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]
(dnn/input_from_feature_columns/input_layer/all_features/ToFloat, 
dnn/input_from_feature_columns/input_layer/all_features/Reshape/shape)]]

数据的形状符合我的预期,但 feature_column 似乎期待它的正方形。

>>> train_features.shape
(10737, 6010)
>>>train_labels.shape
(10737, 1)
>>> 36120100./6010
6010.0

我的理解是Dataset.from_tensor_slices沿给定张量的轴0取片,这与错误消息相对应"Input to reshape is a tensor with 6010 values."但是为什么要求具有36120100个值的形状?

I'd still like to know why the above wasn't working, or how to debug though.

问题出在 train_iterator.get_next() 生成的张量大小上。如果未指定批量大小,则迭代器 returns:

({'all_features': <tf.Tensor 'IteratorGetNext:0' shape=(6010,) dtype=float64>}, 
 <tf.Tensor 'IteratorGetNext:1' shape=(1,) dtype=float64>)

... 元组。如您所见,特征张量形状为 (6010,)DNNClassifier 解释为 batch_size=6010(按照惯例,第一个维度是批量大小),它仍然期望 6010特征。因此出现错误:它无法将 (6010,) 重塑为 (6010, 6010).

为了使其正常工作,您必须手动重塑此张量,或者通过调用以下方法简单地设置批量大小:

train_dataset = train_dataset.batch(16)

即使批量大小 1 也可以,因为它会强制 get_next 张量为:

({'all_features': <tf.Tensor 'IteratorGetNext:0' shape=(?, 6010) dtype=float64>}, 
 <tf.Tensor 'IteratorGetNext:1' shape=(?, 1) dtype=float64>)

...但您显然希望将其设置得更大以提高效率。