机器学习模型训练好后如何应用?
How do I apply ML model after it has been trained?
对于这个幼稚的问题,我深表歉意,我已经在 python 中训练了一个模型(朴素贝叶斯),它运行良好(准确率为 95%)。它接受一个输入字符串(即 'Apple Inc.' 或 'John Doe')并辨别它是公司名称还是客户名称。
我如何在另一个数据集上实际实现它?如果我引入另一个 pandas 数据帧,我如何将模型从训练数据中学到的东西应用到新数据帧?
新的数据框有一个全新的人口和一组字符串,它需要预测它是公司名称还是客户名称。
理想情况下,我想在新数据框中插入一个包含模型预测的列。
感谢任何代码片段。
当前型号示例代码:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df["CUST_NM_CLEAN"],
df["LABEL"],test_size=0.20,
random_state=1)
# Instantiate the CountVectorizer method
count_vector = CountVectorizer()
# Fit the training data and then return the matrix
training_data = count_vector.fit_transform(X_train)
# Transform testing data and return the matrix.
testing_data = count_vector.transform(X_test)
#in this case we try multinomial, there are two other methods
from sklearn.naive_bayes import cNB
naive_bayes = MultinomialNB()
naive_bayes.fit(training_data,y_train)
#MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
predictions = naive_bayes.predict(testing_data)
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print('Accuracy score: {}'.format(accuracy_score(y_test, predictions)))
print('Precision score: {}'.format(precision_score(y_test, predictions, pos_label='Org')))
print('Recall score: {}'.format(recall_score(y_test, predictions, pos_label='Org')))
print('F1 score: {}'.format(f1_score(y_test, predictions, pos_label='Org')))
想通了。
# Convert a collection of text documents to a vector of term/token counts.
cnt_vect_for_new_data = count_vector.transform(df['new_data'])
#RUN Prediction
df['NEW_DATA_PREDICTION'] = naive_bayes.predict(cnt_vect_for_new_data)
对于这个幼稚的问题,我深表歉意,我已经在 python 中训练了一个模型(朴素贝叶斯),它运行良好(准确率为 95%)。它接受一个输入字符串(即 'Apple Inc.' 或 'John Doe')并辨别它是公司名称还是客户名称。
我如何在另一个数据集上实际实现它?如果我引入另一个 pandas 数据帧,我如何将模型从训练数据中学到的东西应用到新数据帧?
新的数据框有一个全新的人口和一组字符串,它需要预测它是公司名称还是客户名称。
理想情况下,我想在新数据框中插入一个包含模型预测的列。
感谢任何代码片段。
当前型号示例代码:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df["CUST_NM_CLEAN"],
df["LABEL"],test_size=0.20,
random_state=1)
# Instantiate the CountVectorizer method
count_vector = CountVectorizer()
# Fit the training data and then return the matrix
training_data = count_vector.fit_transform(X_train)
# Transform testing data and return the matrix.
testing_data = count_vector.transform(X_test)
#in this case we try multinomial, there are two other methods
from sklearn.naive_bayes import cNB
naive_bayes = MultinomialNB()
naive_bayes.fit(training_data,y_train)
#MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
predictions = naive_bayes.predict(testing_data)
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print('Accuracy score: {}'.format(accuracy_score(y_test, predictions)))
print('Precision score: {}'.format(precision_score(y_test, predictions, pos_label='Org')))
print('Recall score: {}'.format(recall_score(y_test, predictions, pos_label='Org')))
print('F1 score: {}'.format(f1_score(y_test, predictions, pos_label='Org')))
想通了。
# Convert a collection of text documents to a vector of term/token counts.
cnt_vect_for_new_data = count_vector.transform(df['new_data'])
#RUN Prediction
df['NEW_DATA_PREDICTION'] = naive_bayes.predict(cnt_vect_for_new_data)