合并两个 countvectorizers 时出现 isnan 错误
I get isnan error when I merge two countvectorizers
我要做方言文本分类,我有这个代码:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
vectorizerN = CountVectorizer(analyzer='char',ngram_range=(3,4))
XN = vectorizerN.fit_transform(X_train)
vectorizerMX = CountVectorizer(vocabulary=a['vocabs'])
MX = vectorizerMX.fit_transform(X_train)
from sklearn.pipeline import FeatureUnion
combined_features = FeatureUnion([('CountVectorizer', MX),('CountVect', XN)])
combined_features.transform(test_data)
当我 运行 这个代码时,我得到这个错误:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
我正在遵循此 post 中的代码:
Merging CountVectorizer in Scikit-Learn feature extraction
还有,请问我之后如何训练和预测?
您应该合并 vectorizerN
和 vectorizerMX
,而不是 MX
和 XN
。
将行更改为
combined_features = FeatureUnion([('CountVectorizer', vectorizerMX), ('CountVect', vectorizerN)])
我要做方言文本分类,我有这个代码:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
vectorizerN = CountVectorizer(analyzer='char',ngram_range=(3,4))
XN = vectorizerN.fit_transform(X_train)
vectorizerMX = CountVectorizer(vocabulary=a['vocabs'])
MX = vectorizerMX.fit_transform(X_train)
from sklearn.pipeline import FeatureUnion
combined_features = FeatureUnion([('CountVectorizer', MX),('CountVect', XN)])
combined_features.transform(test_data)
当我 运行 这个代码时,我得到这个错误:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
我正在遵循此 post 中的代码: Merging CountVectorizer in Scikit-Learn feature extraction
还有,请问我之后如何训练和预测?
您应该合并 vectorizerN
和 vectorizerMX
,而不是 MX
和 XN
。
将行更改为
combined_features = FeatureUnion([('CountVectorizer', vectorizerMX), ('CountVect', vectorizerN)])