如何使用 sklearn.pipeline 转换新数据
How to transform new data using sklearn.pipeline
我用 TfIdfVectorizer 转换器和 OnevsRestClassifier 估计器创建了一个管道,并在训练数据上对其进行了如下训练
# Split data using train_test_split
print "Split data into train and test sets"
x_train, x_test, y_train, y_test = train_test_split(
data_x, data_y, test_size=0.33)
# transform matrix of plots into lists to pass to a TfidfVectorizer
train_x = [x[0].strip() for x in x_train.tolist()]
test_x = [x[0].strip() for x in x_test.tolist()]
# Pipeline fit and transform
print "Learn the model using train data"
model = text_clf.fit(train_x, y_train)
# Predict the test data
print "Predict the recipients on test data"
predictions = model.predict(test_x)
现在,我想使用经过训练的模型来预测 类 新的未标记数据。
我试过了,但出现错误
# Read text from input
text = raw_input()
print "Input : ", text
new_data = text_clf.transform([text])
predict = model.predict(new_data)
这是错误。我做错了什么?
AttributeError: 'OneVsRestClassifier' object has no attribute 'transform'
如果 text_clf
和 model
是您建议的管道,则无需调用转换然后预测。只打电话
predictions = model.predict([text])
管道将在内部自动将数据转换为可用格式(在中间转换器上使用 transform()
)。
当您显式调用 model.transform()
时,管道假定管道内的所有估算器都有一个 transform(),但此处不是这种情况。
我用 TfIdfVectorizer 转换器和 OnevsRestClassifier 估计器创建了一个管道,并在训练数据上对其进行了如下训练
# Split data using train_test_split
print "Split data into train and test sets"
x_train, x_test, y_train, y_test = train_test_split(
data_x, data_y, test_size=0.33)
# transform matrix of plots into lists to pass to a TfidfVectorizer
train_x = [x[0].strip() for x in x_train.tolist()]
test_x = [x[0].strip() for x in x_test.tolist()]
# Pipeline fit and transform
print "Learn the model using train data"
model = text_clf.fit(train_x, y_train)
# Predict the test data
print "Predict the recipients on test data"
predictions = model.predict(test_x)
现在,我想使用经过训练的模型来预测 类 新的未标记数据。 我试过了,但出现错误
# Read text from input
text = raw_input()
print "Input : ", text
new_data = text_clf.transform([text])
predict = model.predict(new_data)
这是错误。我做错了什么?
AttributeError: 'OneVsRestClassifier' object has no attribute 'transform'
如果 text_clf
和 model
是您建议的管道,则无需调用转换然后预测。只打电话
predictions = model.predict([text])
管道将在内部自动将数据转换为可用格式(在中间转换器上使用 transform()
)。
当您显式调用 model.transform()
时,管道假定管道内的所有估算器都有一个 transform(),但此处不是这种情况。