如何用FeatureUnion构建参数网格?
How to build parameter grid with FeatureUnion?
我正在尝试 运行 这个包含文本和数字特征的组合模型,但出现错误 ValueError: Invalid parameter tfidf for estimator
。 parameters
语法中有问题吗?
可能有用的链接:
FeatureUnion usage
FeatureUnion documentation
tknzr = tokenize.word_tokenize
vect = CountVectorizer(tokenizer=tknzr, stop_words={'english'}, max_df=0.9, min_df=2)
scl = StandardScaler(with_mean=False)
tfidf = TfidfTransformer(norm=None)
parameters = {
'vect__ngram_range': [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)],
'tfidf__use_idf': (True, False),
'clf__alpha': tuple(10 ** (np.arange(-4, 4, dtype='float'))),
'clf__loss': ('hinge', 'squared_hinge', 'log', 'modified_huber', 'perceptron'),
'clf__penalty': ('l1', 'l2'),
'clf__tol': (1e07, 1e-6, 1e-5, 1e-4, 1e-3)
}
combined_clf = Pipeline([
('features', FeatureUnion([
('numeric_features', Pipeline([
('selector', transfomer_numeric)
])),
('text_features', Pipeline([
('selector', transformer_text),
('vect', vect),
('tfidf', tfidf),
('scaler', scl),
]))
])),
('clf', SGDClassifier(random_state=42,
max_iter=int(10 ** 6 / len(X_train)), shuffle=True))
])
如 here 所述,嵌套参数必须通过 __
(双下划线)语法访问。根据您要访问的参数的深度,这将递归应用。参数use_idf
在:
下
features
> text_features
> tfidf
> use_idf
因此您的网格中的结果参数需要为:
'features__text_features__tfidf__use_idf': [True, False]
同样,ngram_range
的语法应该是:
'features__text_features__vect__ngram_range': [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]
我正在尝试 运行 这个包含文本和数字特征的组合模型,但出现错误 ValueError: Invalid parameter tfidf for estimator
。 parameters
语法中有问题吗?
可能有用的链接:
FeatureUnion usage
FeatureUnion documentation
tknzr = tokenize.word_tokenize
vect = CountVectorizer(tokenizer=tknzr, stop_words={'english'}, max_df=0.9, min_df=2)
scl = StandardScaler(with_mean=False)
tfidf = TfidfTransformer(norm=None)
parameters = {
'vect__ngram_range': [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)],
'tfidf__use_idf': (True, False),
'clf__alpha': tuple(10 ** (np.arange(-4, 4, dtype='float'))),
'clf__loss': ('hinge', 'squared_hinge', 'log', 'modified_huber', 'perceptron'),
'clf__penalty': ('l1', 'l2'),
'clf__tol': (1e07, 1e-6, 1e-5, 1e-4, 1e-3)
}
combined_clf = Pipeline([
('features', FeatureUnion([
('numeric_features', Pipeline([
('selector', transfomer_numeric)
])),
('text_features', Pipeline([
('selector', transformer_text),
('vect', vect),
('tfidf', tfidf),
('scaler', scl),
]))
])),
('clf', SGDClassifier(random_state=42,
max_iter=int(10 ** 6 / len(X_train)), shuffle=True))
])
如 here 所述,嵌套参数必须通过 __
(双下划线)语法访问。根据您要访问的参数的深度,这将递归应用。参数use_idf
在:
features
> text_features
> tfidf
> use_idf
因此您的网格中的结果参数需要为:
'features__text_features__tfidf__use_idf': [True, False]
同样,ngram_range
的语法应该是:
'features__text_features__vect__ngram_range': [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]