将 tf.contrib.learn 个输入输入 DNNClassifier
Feeding tf.contrib.learn inputs into DNNClassifier
我是张量流和 Whosebug 的新手,所以对于任何愚蠢的错误提前道歉。我在提供较低级别的接口方面取得了很好的成功。所以我决定尝试 tf.contrib.learn
更高级别的 api,因为它看起来很简单。我在 Google Cloud Datalab(Jupyter notebook)中工作,但我遇到了障碍,正在寻求帮助。
主要问题:我如何实例化一个 DNNClassifier
以便我可以为其提供一个本身就是 tf.float32
数字列表的特征?
这是详细信息。我正在使用以下代码读取基于 TFRecords
的输入文件:
def read_and_decode(filename_queue):
# get a tensorflow reader and read in an example
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
# parse a single example
features = tf.parse_single_example(serialized_example, features={
'label': tf.FixedLenFeature([], tf.int64),
'features': tf.FixedLenFeature([], tf.string)} )
# convert to tensors and return
bag_of_words = tf.decode_raw(features['features'], tf.float32)
bag_of_words.set_shape([LEN_OF_LEXICON])
label = tf.cast(features['label'], tf.int32)
return bag_of_words, label
我的单元测试看起来是这样的:
# unit test
filename = VALIDATION_FILE
my_filename_queue = tf.train.string_input_producer([filename],
num_epochs=1)
x, y = read_and_decode(my_filename_queue)
print ('x[0] -> ', x[0])
print ('x[1] -> ', x[1])
print ('y -> ', y, 'type -> ', type(y))
print ('x -> ', x, 'type -> ', type(x))
并给出以下输出:
x[0] -> Tensor("strided_slice_6:0", shape=(), dtype=float32)
x[1] -> Tensor("strided_slice_7:0", shape=(), dtype=float32)
y -> Tensor("Cast_6:0", shape=(), dtype=int32) type -> <class
'tensorflow.python.framework.ops.Tensor'>
x -> Tensor("DecodeRaw_3:0", shape=(2633,), dtype=float32) type ->
<class 'tensorflow.python.framework.ops.Tensor'>
read_and_decode 函数由 input_pipeline 调用,它具有以下定义和单元测试:
def input_pipeline(filenames, batch_size, num_epochs=None):
filename_queue = tf.train.string_input_producer(filenames,
num_epochs=num_epochs, shuffle=True)
example, label = read_and_decode(filename_queue)
min_after_dequeue = 10000
capacity = min_after_dequeue + 3 * batch_size
example_batch, label_batch = tf.train.shuffle_batch([example,
label], batch_size=batch_size, capacity=capacity,
min_after_dequeue=min_after_dequeue)
return example_batch, label_batch
# unit test
x, y = input_pipeline([VALIDATION_FILE], BATCH_SIZE, num_epochs=1)
print ('y -> ', y, 'type -> ', type(y))
print ('x -> ', x, 'type -> ', type(x))
并有以下输出:
y -> Tensor("shuffle_batch_4:1", shape=(100,), dtype=int32) type ->
<class 'tensorflow.python.framework.ops.Tensor'>
x -> Tensor("shuffle_batch_4:0", shape=(100, 2633), dtype=float32)
type -> <class 'tensorflow.python.framework.ops.Tensor'>
接受这些提要的培训师如下所示:
def run_training():
#feature_columns = ????????????
feature_columns = tf.contrib.layers.real_valued_column("",
dimension=LEN_OF_LEXICON, dtype=tf.float32)
estimator = tf.contrib.learn.DNNClassifier(
feature_columns=feature_columns,
n_classes=5,
hidden_units=[1024, 512, 256],
optimizer =
tf.train.ProximalAdagradOptimizer(learning_rate=0.1,
l1_regularization_strength=0.001) )
estimator.fit(input_fn=lambda: input_pipeline([VALIDATION_FILE],
BATCH_SIZE, num_epochs=1))
# unit test
run_training()
DNNClassifier
的实例化顺利通过,但对 estimator.fit()
的调用抛出异常(回溯到下面的代码段下方)。我的 input_pipeline
正在提供张量流文档中所示的提要,但不知何故,张量内的数据形式似乎不正确。有人对此有任何想法吗?
---------------- Traceback Snippet -----------------
> `/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/dnn.pyc in _dnn_model_fn(features, labels, mode, params, config)
126 feature_columns=feature_columns,
127 weight_collections=[parent_scope],
--> 128 scope=scope)
129
130 hidden_layer_partitioner = (
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in input_from_feature_columns(columns_to_tensors, feature_columns, weight_collections, trainable, scope)
247 scope,
248 output_rank=2,
--> 249 default_name='input_from_feature_columns')
250
251
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in _input_from_feature_columns(columns_to_tensors, feature_columns, weight_collections, trainable, scope, output_rank, default_name)
145 default_name):
146 """Implementation of `input_from(_sequence)_feature_columns`."""
--> 147 check_feature_columns(feature_columns)
148 with variable_scope.variable_scope(scope,
149 default_name=default_name,
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in check_feature_columns(feature_columns)
806 seen_keys = set()
807 for f in feature_columns:
--> 808 key = f.key
809 if key in seen_keys:
810 raise ValueError('Duplicate feature column key found for column: {}. '
AttributeError: 'str' object has no attribute 'key'
`
解决方法是使用函数:
feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input_fn(lambda: input_pipeline([INPUT_FILE], BATCH_SIZE, num_epochs=1))
它从您的 input_fn 的输出签名中推断列。轻松愉快!
我是张量流和 Whosebug 的新手,所以对于任何愚蠢的错误提前道歉。我在提供较低级别的接口方面取得了很好的成功。所以我决定尝试 tf.contrib.learn
更高级别的 api,因为它看起来很简单。我在 Google Cloud Datalab(Jupyter notebook)中工作,但我遇到了障碍,正在寻求帮助。
主要问题:我如何实例化一个 DNNClassifier
以便我可以为其提供一个本身就是 tf.float32
数字列表的特征?
这是详细信息。我正在使用以下代码读取基于 TFRecords
的输入文件:
def read_and_decode(filename_queue):
# get a tensorflow reader and read in an example
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
# parse a single example
features = tf.parse_single_example(serialized_example, features={
'label': tf.FixedLenFeature([], tf.int64),
'features': tf.FixedLenFeature([], tf.string)} )
# convert to tensors and return
bag_of_words = tf.decode_raw(features['features'], tf.float32)
bag_of_words.set_shape([LEN_OF_LEXICON])
label = tf.cast(features['label'], tf.int32)
return bag_of_words, label
我的单元测试看起来是这样的:
# unit test
filename = VALIDATION_FILE
my_filename_queue = tf.train.string_input_producer([filename],
num_epochs=1)
x, y = read_and_decode(my_filename_queue)
print ('x[0] -> ', x[0])
print ('x[1] -> ', x[1])
print ('y -> ', y, 'type -> ', type(y))
print ('x -> ', x, 'type -> ', type(x))
并给出以下输出:
x[0] -> Tensor("strided_slice_6:0", shape=(), dtype=float32)
x[1] -> Tensor("strided_slice_7:0", shape=(), dtype=float32)
y -> Tensor("Cast_6:0", shape=(), dtype=int32) type -> <class
'tensorflow.python.framework.ops.Tensor'>
x -> Tensor("DecodeRaw_3:0", shape=(2633,), dtype=float32) type ->
<class 'tensorflow.python.framework.ops.Tensor'>
read_and_decode 函数由 input_pipeline 调用,它具有以下定义和单元测试:
def input_pipeline(filenames, batch_size, num_epochs=None):
filename_queue = tf.train.string_input_producer(filenames,
num_epochs=num_epochs, shuffle=True)
example, label = read_and_decode(filename_queue)
min_after_dequeue = 10000
capacity = min_after_dequeue + 3 * batch_size
example_batch, label_batch = tf.train.shuffle_batch([example,
label], batch_size=batch_size, capacity=capacity,
min_after_dequeue=min_after_dequeue)
return example_batch, label_batch
# unit test
x, y = input_pipeline([VALIDATION_FILE], BATCH_SIZE, num_epochs=1)
print ('y -> ', y, 'type -> ', type(y))
print ('x -> ', x, 'type -> ', type(x))
并有以下输出:
y -> Tensor("shuffle_batch_4:1", shape=(100,), dtype=int32) type ->
<class 'tensorflow.python.framework.ops.Tensor'>
x -> Tensor("shuffle_batch_4:0", shape=(100, 2633), dtype=float32)
type -> <class 'tensorflow.python.framework.ops.Tensor'>
接受这些提要的培训师如下所示:
def run_training():
#feature_columns = ????????????
feature_columns = tf.contrib.layers.real_valued_column("",
dimension=LEN_OF_LEXICON, dtype=tf.float32)
estimator = tf.contrib.learn.DNNClassifier(
feature_columns=feature_columns,
n_classes=5,
hidden_units=[1024, 512, 256],
optimizer =
tf.train.ProximalAdagradOptimizer(learning_rate=0.1,
l1_regularization_strength=0.001) )
estimator.fit(input_fn=lambda: input_pipeline([VALIDATION_FILE],
BATCH_SIZE, num_epochs=1))
# unit test
run_training()
DNNClassifier
的实例化顺利通过,但对 estimator.fit()
的调用抛出异常(回溯到下面的代码段下方)。我的 input_pipeline
正在提供张量流文档中所示的提要,但不知何故,张量内的数据形式似乎不正确。有人对此有任何想法吗?
---------------- Traceback Snippet -----------------
> `/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/dnn.pyc in _dnn_model_fn(features, labels, mode, params, config)
126 feature_columns=feature_columns,
127 weight_collections=[parent_scope],
--> 128 scope=scope)
129
130 hidden_layer_partitioner = (
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in input_from_feature_columns(columns_to_tensors, feature_columns, weight_collections, trainable, scope)
247 scope,
248 output_rank=2,
--> 249 default_name='input_from_feature_columns')
250
251
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in _input_from_feature_columns(columns_to_tensors, feature_columns, weight_collections, trainable, scope, output_rank, default_name)
145 default_name):
146 """Implementation of `input_from(_sequence)_feature_columns`."""
--> 147 check_feature_columns(feature_columns)
148 with variable_scope.variable_scope(scope,
149 default_name=default_name,
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in check_feature_columns(feature_columns)
806 seen_keys = set()
807 for f in feature_columns:
--> 808 key = f.key
809 if key in seen_keys:
810 raise ValueError('Duplicate feature column key found for column: {}. '
AttributeError: 'str' object has no attribute 'key'
`
解决方法是使用函数:
feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input_fn(lambda: input_pipeline([INPUT_FILE], BATCH_SIZE, num_epochs=1))
它从您的 input_fn 的输出签名中推断列。轻松愉快!