多Class多目标Class化问题的最佳损失函数
Best Loss Function for Multi-Class Multi-Target Classification Problem
我有一个 class 化问题,我不知道如何归类这个 class 化问题。据我了解,
A Multiclass classification problem is where you have multiple mutually exclusive classes and each data point in the dataset can only be labelled by one class. For example, in an Image Classification task for fruits, a fruit data point labelled as an apple cannot be an orange and an orange cannot be a banana and so on. Each data point, in this case can only be any one of the fruits of the fruits class and so is labelled accordingly.
哪里...
A Multilabel classification is a problem where you have multiple sets of mutually exclusive classes of which the data point can be labelled simultaneously. For example, in an Image Classification task for Cars, a car data point labelled as a sedan cannot be a hatchback and a hatchback cannot be a SUV and so on for the type of car. At the same time, the same car data point can be labelled one from VW, Ford, Mercedes, etc. as the car manufacturer. So in this case, the car data point is labeled from two different sets of mutually exclusive classes.
如果我的理解有误,请指正。
现在说说我的问题,我的 class 多个 class 化问题,比方说 A、B、C、D 和 E。这里每个数据点可以有一个或多个class来自如下左图的集合:
|-------------|----------| |-------------|-----------------|
| X | y | | X | One-Hot-Y |
|-------------|----------| |-------------|-----------------|
| DP1 | A, B | | DP1 | [1, 1, 0, 0, 0] |
|-------------|----------| |-------------|-----------------|
| DP2 | C | | DP2 | [0, 0, 1, 0, 0] |
|-------------|----------| |-------------|-----------------|
| DP3 | B, E | | DP3 | [0, 1, 0, 0, 1] |
|-------------|----------| |-------------|-----------------|
| DP4 | A, C | | DP4 | [1, 0, 1, 0, 0] |
|-------------|----------| |-------------|-----------------|
| DP5 | D | | DP5 | [0, 0, 0, 1, 0] |
|-------------|----------| |-------------|-----------------|
I One-Hot 编码训练标签如上右图。我的问题是:
- 我可以使用什么损失函数(最好在 PyTorch 中)来训练模型以针对 One-Hot 编码输出进行优化
- 我们如何称呼这样的 class化问题?多标签或多class?
感谢您的回答!
What Loss function (preferably in PyTorch) can I use for training the
model to optimize for the One-Hot encoded output
您可以使用 torch.nn.BCEWithLogitsLoss (or MultiLabelSoftMarginLoss as they are equivalent) and see how this one works out. This is standard approach, other possibility could be MultilabelMarginLoss.
What do we call such a classification problem? Multi-label or Multi-class?
它是多标签的(因为可以同时存在多个标签)。在 one-hot 编码中:
[1, 1, 0, 0, 0], [0, 1, 0, 0, 1] - multilabel
[0, 0, 1, 0, 0] - multiclass
[1], [0] - binary (special case of multiclass)
multiclass 不能有多个 1
因为所有其他标签都是互斥的。
我有一个 class 化问题,我不知道如何归类这个 class 化问题。据我了解,
A Multiclass classification problem is where you have multiple mutually exclusive classes and each data point in the dataset can only be labelled by one class. For example, in an Image Classification task for fruits, a fruit data point labelled as an apple cannot be an orange and an orange cannot be a banana and so on. Each data point, in this case can only be any one of the fruits of the fruits class and so is labelled accordingly.
哪里...
A Multilabel classification is a problem where you have multiple sets of mutually exclusive classes of which the data point can be labelled simultaneously. For example, in an Image Classification task for Cars, a car data point labelled as a sedan cannot be a hatchback and a hatchback cannot be a SUV and so on for the type of car. At the same time, the same car data point can be labelled one from VW, Ford, Mercedes, etc. as the car manufacturer. So in this case, the car data point is labeled from two different sets of mutually exclusive classes.
如果我的理解有误,请指正。
现在说说我的问题,我的 class 多个 class 化问题,比方说 A、B、C、D 和 E。这里每个数据点可以有一个或多个class来自如下左图的集合:
|-------------|----------| |-------------|-----------------|
| X | y | | X | One-Hot-Y |
|-------------|----------| |-------------|-----------------|
| DP1 | A, B | | DP1 | [1, 1, 0, 0, 0] |
|-------------|----------| |-------------|-----------------|
| DP2 | C | | DP2 | [0, 0, 1, 0, 0] |
|-------------|----------| |-------------|-----------------|
| DP3 | B, E | | DP3 | [0, 1, 0, 0, 1] |
|-------------|----------| |-------------|-----------------|
| DP4 | A, C | | DP4 | [1, 0, 1, 0, 0] |
|-------------|----------| |-------------|-----------------|
| DP5 | D | | DP5 | [0, 0, 0, 1, 0] |
|-------------|----------| |-------------|-----------------|
I One-Hot 编码训练标签如上右图。我的问题是:
- 我可以使用什么损失函数(最好在 PyTorch 中)来训练模型以针对 One-Hot 编码输出进行优化
- 我们如何称呼这样的 class化问题?多标签或多class?
感谢您的回答!
What Loss function (preferably in PyTorch) can I use for training the model to optimize for the One-Hot encoded output
您可以使用 torch.nn.BCEWithLogitsLoss (or MultiLabelSoftMarginLoss as they are equivalent) and see how this one works out. This is standard approach, other possibility could be MultilabelMarginLoss.
What do we call such a classification problem? Multi-label or Multi-class?
它是多标签的(因为可以同时存在多个标签)。在 one-hot 编码中:
[1, 1, 0, 0, 0], [0, 1, 0, 0, 1] - multilabel
[0, 0, 1, 0, 0] - multiclass
[1], [0] - binary (special case of multiclass)
multiclass 不能有多个 1
因为所有其他标签都是互斥的。