带有 tf-idf 稀疏矩阵的 Tensorflow DNN
Tensorflow DNN with tf-idf sparse matrix
正在尝试实现用于文本分类的 tesorflow DNN。
tf-idf 稀疏 IV:
X_train_sam:
<31819x3122 sparse matrix of type '<class 'numpy.float64'>'with 610128 stored elements in Compressed Sparse Row format>
标记为 DV:
y_train_sam.values:array(['mexican', 'mexican', 'italian', ..., 'chinese', 'italian','italian'], dtype=object)
使用以下片段将稀疏转换为张量:
def convert_sparse_matrix_to_sparse_tensor(X):
coo = X.tocoo()
indices = np.mat([coo.row, coo.col]).transpose()
return tf.SparseTensorValue(indices, coo.data, coo.shape)
X_train_sam = convert_sparse_matrix_to_sparse_tensor(X_train_sam)
正在为建模准备数据
def train_input_fn(features, labels, batch_size):
dataset = tf.data.Dataset.from_tensors((features, labels))
dataset = dataset.shuffle(1000).repeat().batch(batch_size)
return dataset.make_one_shot_iterator().get_next()
inp = train_input_fn(X_train_sam,y_train_sam.values,batch_size=1000)
应用 DNN 分类器
classifier = tf.estimator.DNNClassifier(
feature_columns=[float]*X_train_sam.dense_shape[1],
hidden_units=[10, 10],
n_classes=len(y_train_sam.unique()))
classifier.train(input_fn=lambda:inp)
出现以下错误:
ValueError: features should be a dictionary of `Tensor`s. Given type: <class 'tensorflow.python.framework.sparse_tensor.SparseTensorValue'>
请指点,我是ML和tensorflow的新手。
如果在你的代码中这一行
classifier.train(input_fn=lambda:inp)
lambda:inp
应该是字典还是你的意思是匿名函数?
来自
的文档
https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier
input_fn: Input function returning a tuple of: features - Tensor or
dictionary of string feature name to Tensor. labels - Tensor or
dictionary of Tensor with labels.
所以你需要一个 returns 元组而不是单个值的函数...
正在尝试实现用于文本分类的 tesorflow DNN。
tf-idf 稀疏 IV:
X_train_sam:
<31819x3122 sparse matrix of type '<class 'numpy.float64'>'with 610128 stored elements in Compressed Sparse Row format>
标记为 DV:
y_train_sam.values:array(['mexican', 'mexican', 'italian', ..., 'chinese', 'italian','italian'], dtype=object)
使用以下片段将稀疏转换为张量:
def convert_sparse_matrix_to_sparse_tensor(X):
coo = X.tocoo()
indices = np.mat([coo.row, coo.col]).transpose()
return tf.SparseTensorValue(indices, coo.data, coo.shape)
X_train_sam = convert_sparse_matrix_to_sparse_tensor(X_train_sam)
正在为建模准备数据
def train_input_fn(features, labels, batch_size):
dataset = tf.data.Dataset.from_tensors((features, labels))
dataset = dataset.shuffle(1000).repeat().batch(batch_size)
return dataset.make_one_shot_iterator().get_next()
inp = train_input_fn(X_train_sam,y_train_sam.values,batch_size=1000)
应用 DNN 分类器
classifier = tf.estimator.DNNClassifier(
feature_columns=[float]*X_train_sam.dense_shape[1],
hidden_units=[10, 10],
n_classes=len(y_train_sam.unique()))
classifier.train(input_fn=lambda:inp)
出现以下错误:
ValueError: features should be a dictionary of `Tensor`s. Given type: <class 'tensorflow.python.framework.sparse_tensor.SparseTensorValue'>
请指点,我是ML和tensorflow的新手。
如果在你的代码中这一行
classifier.train(input_fn=lambda:inp)
lambda:inp
应该是字典还是你的意思是匿名函数?
来自
https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier
input_fn: Input function returning a tuple of: features - Tensor or dictionary of string feature name to Tensor. labels - Tensor or dictionary of Tensor with labels.
所以你需要一个 returns 元组而不是单个值的函数...