在 tensorflow 中使用 DNNLinearCombinedEstimator 进行多标签分类
Using DNNLinearCombinedEstimator in tensorflow for multilabel classification
我有一个多标签数据集,我想使用 wide-n-deep 神经网络对样本进行分类。
这是一个非常小的例子,只是为了测试:
import numpy as np
import pandas as pd
import tensorflow as tf
tf.enable_eager_execution()
training_df: pd.DataFrame = pd.DataFrame(
data={
'feature1': np.random.rand(10),
'feature2': np.random.rand(10),
'feature3': np.random.rand(10),
'feature4': np.random.randint(0, 3, 10),
'feature5': np.random.randint(0, 3, 10),
'feature6': np.random.randint(0, 3, 10),
'target1': np.random.randint(0, 2, 10),
'target2': np.random.randint(0, 2, 10),
'target3': np.random.randint(0, 2, 10)
}
)
features = ['feature1', 'feature2', 'feature3','feature4', 'feature5', 'feature6']
targets = ['target1', 'target2', 'target3']
Categorical_Cols = ['feature4', 'feature5', 'feature6']
Numerical_Cols = ['feature1', 'feature2', 'feature3']
wide_columns = [tf.feature_column.categorical_column_with_vocabulary_list(key=x, vocabulary_list=[0, 1, -1])
for x in Categorical_Cols]
deep_columns = [tf.feature_column.numeric_column(x) for x in Numerical_Cols]
def wrap_dataset(df, features, labels):
dataset = (
tf.data.Dataset.from_tensor_slices(
(
tf.cast(df[features].values, tf.float32),
tf.cast(df[labels].values, tf.int32),
)
)
)
return(dataset)
input_fn_train = wrap_dataset(training_df, features, targets)
m = tf.contrib.estimator.DNNLinearCombinedEstimator(
head=tf.contrib.estimator.multi_label_head(n_classes=2),
# wide settings
linear_feature_columns=wide_columns,
# linear_optimizer=tf.train.FtrlOptimizer(...),
# deep settings
dnn_feature_columns=deep_columns,
# dnn_optimizer=tf.train.ProximalAdagradOptimizer(...),
dnn_hidden_units=[10, 30, 10])
m.train(input_fn=input_fn_train)
在这个例子中,我们有 6 个特征,包括:
- 3 个数值特征:feature1、feature2 和 feature3
- 3 个分类特征:feature4、feature5 和 feature6
其中每个样本都有三个标签,每个标签都有一个二进制值:0 或 1。
错误与输入函数有关,我不知道如何以正确的方式定义输入函数。
感谢任何帮助更正代码的帮助。
更新:错误是:
TypeError: <TensorSliceDataset shapes: ((6,), (3,)), types: (tf.float32, tf.int32)> is not a callable object
因为它说它不是可调用对象,您只需添加 lambda 就可以了
input_fn_train = lambda: wrap_dataset(training_df, features, targets)
此外,我认为您需要弄清楚如何将数据传递给 Estimator。由于您使用的是特征列,因此可能需要字典。现在你传递的是张量而不是张量字典。看看这个 useful post。
最后,我想出了如何使代码正常工作。我 post 在这里是为了帮助那些想使用 tensorflow 包 1.13 版的内置函数 DNNLinearCombinedEstimator
进行多标签分类的人。
import numpy as np
import pandas as pd
import tensorflow as tf
# from tensorflow import contrib
tf.enable_eager_execution()
training_df: pd.DataFrame = pd.DataFrame(
data={
'feature1': np.random.rand(10),
'feature2': np.random.rand(10),
'feature3': np.random.rand(10),
'feature4': np.random.randint(0, 3, 10),
'feature5': np.random.randint(0, 3, 10),
'feature6': np.random.randint(0, 3, 10),
'target1': np.random.randint(0, 2, 10),
'target2': np.random.randint(0, 2, 10),
'target3': np.random.randint(0, 2, 10)
}
)
features = ['feature1', 'feature2', 'feature3','feature4', 'feature5', 'feature6']
targets = ['target1', 'target2', 'target3']
Categorical_Cols = ['feature4', 'feature5', 'feature6']
Numerical_Cols = ['feature1', 'feature2', 'feature3']
wide_columns = [tf.feature_column.categorical_column_with_vocabulary_list(key=x, vocabulary_list=[0, 1, -1])
for x in Categorical_Cols]
deep_columns = [tf.feature_column.numeric_column(x) for x in Numerical_Cols]
def input_fn(df):
# Creates a dictionary mapping from each continuous feature column name (k) to
# the values of that column stored in a constant Tensor.
continuous_cols = {k: tf.constant(df[k].values)
for k in Numerical_Cols}
# Creates a dictionary mapping from each categorical feature column name (k)
# to the values of that column stored in a tf.SparseTensor.
categorical_cols = {k: tf.SparseTensor(
indices=[[i, 0] for i in range(df[k].size)],
values=df[k].values,
dense_shape=[df[k].size, 1])
for k in Categorical_Cols}
# Merges the two dictionaries into one.
feature_cols = continuous_cols.copy()
feature_cols.update(categorical_cols)
labels =tf.convert_to_tensor(training_df.as_matrix(training_df[targets].columns.tolist()), dtype=tf.int32)
return feature_cols, labels
def train_input_fn():
return input_fn(training_df)
def eval_input_fn():
return input_fn(training_df)
m = tf.contrib.learn.DNNLinearCombinedEstimator(
head=tf.contrib.learn.multi_label_head(n_classes=3),
# wide settings
linear_feature_columns=wide_columns,
# linear_optimizer=tf.train.FtrlOptimizer(...),
# deep settings
dnn_feature_columns=deep_columns,
# dnn_optimizer=tf.train.ProximalAdagradOptimizer(...),
dnn_hidden_units=[10, 10])
m.train(input_fn=train_input_fn, steps=20)
results = m.evaluate(input_fn=eval_input_fn, steps=1)
print("#########################################################")
for key in sorted(results):
print("%s: %s" % (key, results[key]))
我有一个多标签数据集,我想使用 wide-n-deep 神经网络对样本进行分类。
这是一个非常小的例子,只是为了测试:
import numpy as np
import pandas as pd
import tensorflow as tf
tf.enable_eager_execution()
training_df: pd.DataFrame = pd.DataFrame(
data={
'feature1': np.random.rand(10),
'feature2': np.random.rand(10),
'feature3': np.random.rand(10),
'feature4': np.random.randint(0, 3, 10),
'feature5': np.random.randint(0, 3, 10),
'feature6': np.random.randint(0, 3, 10),
'target1': np.random.randint(0, 2, 10),
'target2': np.random.randint(0, 2, 10),
'target3': np.random.randint(0, 2, 10)
}
)
features = ['feature1', 'feature2', 'feature3','feature4', 'feature5', 'feature6']
targets = ['target1', 'target2', 'target3']
Categorical_Cols = ['feature4', 'feature5', 'feature6']
Numerical_Cols = ['feature1', 'feature2', 'feature3']
wide_columns = [tf.feature_column.categorical_column_with_vocabulary_list(key=x, vocabulary_list=[0, 1, -1])
for x in Categorical_Cols]
deep_columns = [tf.feature_column.numeric_column(x) for x in Numerical_Cols]
def wrap_dataset(df, features, labels):
dataset = (
tf.data.Dataset.from_tensor_slices(
(
tf.cast(df[features].values, tf.float32),
tf.cast(df[labels].values, tf.int32),
)
)
)
return(dataset)
input_fn_train = wrap_dataset(training_df, features, targets)
m = tf.contrib.estimator.DNNLinearCombinedEstimator(
head=tf.contrib.estimator.multi_label_head(n_classes=2),
# wide settings
linear_feature_columns=wide_columns,
# linear_optimizer=tf.train.FtrlOptimizer(...),
# deep settings
dnn_feature_columns=deep_columns,
# dnn_optimizer=tf.train.ProximalAdagradOptimizer(...),
dnn_hidden_units=[10, 30, 10])
m.train(input_fn=input_fn_train)
在这个例子中,我们有 6 个特征,包括:
- 3 个数值特征:feature1、feature2 和 feature3
- 3 个分类特征:feature4、feature5 和 feature6
其中每个样本都有三个标签,每个标签都有一个二进制值:0 或 1。
错误与输入函数有关,我不知道如何以正确的方式定义输入函数。 感谢任何帮助更正代码的帮助。
更新:错误是:
TypeError: <TensorSliceDataset shapes: ((6,), (3,)), types: (tf.float32, tf.int32)> is not a callable object
因为它说它不是可调用对象,您只需添加 lambda 就可以了
input_fn_train = lambda: wrap_dataset(training_df, features, targets)
此外,我认为您需要弄清楚如何将数据传递给 Estimator。由于您使用的是特征列,因此可能需要字典。现在你传递的是张量而不是张量字典。看看这个 useful post。
最后,我想出了如何使代码正常工作。我 post 在这里是为了帮助那些想使用 tensorflow 包 1.13 版的内置函数 DNNLinearCombinedEstimator
进行多标签分类的人。
import numpy as np
import pandas as pd
import tensorflow as tf
# from tensorflow import contrib
tf.enable_eager_execution()
training_df: pd.DataFrame = pd.DataFrame(
data={
'feature1': np.random.rand(10),
'feature2': np.random.rand(10),
'feature3': np.random.rand(10),
'feature4': np.random.randint(0, 3, 10),
'feature5': np.random.randint(0, 3, 10),
'feature6': np.random.randint(0, 3, 10),
'target1': np.random.randint(0, 2, 10),
'target2': np.random.randint(0, 2, 10),
'target3': np.random.randint(0, 2, 10)
}
)
features = ['feature1', 'feature2', 'feature3','feature4', 'feature5', 'feature6']
targets = ['target1', 'target2', 'target3']
Categorical_Cols = ['feature4', 'feature5', 'feature6']
Numerical_Cols = ['feature1', 'feature2', 'feature3']
wide_columns = [tf.feature_column.categorical_column_with_vocabulary_list(key=x, vocabulary_list=[0, 1, -1])
for x in Categorical_Cols]
deep_columns = [tf.feature_column.numeric_column(x) for x in Numerical_Cols]
def input_fn(df):
# Creates a dictionary mapping from each continuous feature column name (k) to
# the values of that column stored in a constant Tensor.
continuous_cols = {k: tf.constant(df[k].values)
for k in Numerical_Cols}
# Creates a dictionary mapping from each categorical feature column name (k)
# to the values of that column stored in a tf.SparseTensor.
categorical_cols = {k: tf.SparseTensor(
indices=[[i, 0] for i in range(df[k].size)],
values=df[k].values,
dense_shape=[df[k].size, 1])
for k in Categorical_Cols}
# Merges the two dictionaries into one.
feature_cols = continuous_cols.copy()
feature_cols.update(categorical_cols)
labels =tf.convert_to_tensor(training_df.as_matrix(training_df[targets].columns.tolist()), dtype=tf.int32)
return feature_cols, labels
def train_input_fn():
return input_fn(training_df)
def eval_input_fn():
return input_fn(training_df)
m = tf.contrib.learn.DNNLinearCombinedEstimator(
head=tf.contrib.learn.multi_label_head(n_classes=3),
# wide settings
linear_feature_columns=wide_columns,
# linear_optimizer=tf.train.FtrlOptimizer(...),
# deep settings
dnn_feature_columns=deep_columns,
# dnn_optimizer=tf.train.ProximalAdagradOptimizer(...),
dnn_hidden_units=[10, 10])
m.train(input_fn=train_input_fn, steps=20)
results = m.evaluate(input_fn=eval_input_fn, steps=1)
print("#########################################################")
for key in sorted(results):
print("%s: %s" % (key, results[key]))