如何处理重复的文本数据但具有不同的标签或类？

How to handle repeating text data but with Different Labels or Classes?

nlp
machine-learning
text-classification
data-science

我正在进行多class 文本分类。但是，我有在数据集中重复的数据。但是，这些不是重复的，因为它们属于不同的 classes。数据有效，这两个 class 彼此接近，重复的文本训练数据不是相同的 class，而是具有相同共享 URL 的差异 class。我该怎么做才能让我的 Text classifier 有效地预测未来输入的概率更高，而无需与其他对手共享概率？还有其他技巧吗注意：只有 10% 的训练数据重复 diff classes.

您要解决的问题不是多class class化而是multi label classification. There are different methods to solve multi label classification. A starting point can be here : https://scikit-learn.org/stable/modules/multiclass.html

如何处理重复的文本数据但具有不同的标签或 类？

How to handle repeating text data but with Different Labels or Classes?

nlp

machine-learning

text-classification

data-science

如何处理重复的文本数据但具有不同的标签或类？