标签二值化不支持多输出目标数据

Multioutput target data is not supported with label binarization

我的 MultinomialNB 模型符合 K 折分割。

我已尝试使用 SMOTE(imblearn.over_sampling、lib)平衡数据

NB_pipeline = Pipeline([
                ('tfidf', TfidfVectorizer(stop_words=stop_words)),
                ('clf', OneVsRestClassifier(MultinomialNB(
                    fit_prior=True, class_prior=None))),
            ])

for train_indices, test_indices in k_fold.split(train_data):

    train_sequencies = train_data.iloc[train_indices]['NAME']
    label_train = train_data.iloc[train_indices][['SEARCH','OPTIONS_VOLUME', 'OPTIONS_QUANTITY', 'OPTIONS_PORTION', 
                                                      'OPTIONS_WEIGHT', 'OPTIONS_SIZE', 'OPTIONS_CONCENTRATION', 
                                                      'OPTIONS_CONTENT', 'OPTIONS_MANUFACTURER']]

    test_sequencies = train_data.iloc[test_indices]['NAME']
    label_test = train_data.iloc[test_indices][['SEARCH','OPTIONS_VOLUME', 'OPTIONS_QUANTITY', 'OPTIONS_PORTION',
                                                'OPTIONS_WEIGHT', 'OPTIONS_SIZE', 'OPTIONS_CONCENTRATION', 
                                                'OPTIONS_CONTENT', 'OPTIONS_MANUFACTURER']]


    NB_pipeline.fit(train_sequencies, label_train)
    predictions = pipeline.predict(test_sequencies)

    confusion += confusion_matrics(test_sequencies, label_test)
    score = f1_score(test_sequencies, label_test)

    score.append(score)

我期待多标签分类的交叉验证

OneVsRestClassifier 包括为每个 class 而不是每个目标安装一个 classifier。

MultinomialNB doesn't support Multioutput target data, you can fit one MultinomialNB per target by using MultiOutputClassifier。这是扩展 class 本身不支持多目标 classification 的简单策略。

NB_pipeline = Pipeline([
                ('tfidf', TfidfVectorizer(stop_words=stop_words)),
                ('clf', MultiOutputClassifier(MultinomialNB( fit_prior=True, class_prior=None))),])

如果你想为每个 class 安装一个 classifier(class 针对所有其他 classes 安装。)和每个目标:

NB_pipeline = Pipeline([
                ('tfidf', TfidfVectorizer(stop_words=stop_words)),
                ('clf', MultiOutputClassifier(OneVsRestClassifier(MultinomialNB( fit_prior=True, class_prior=None)))),])