无法从论文 (PyTorch) 实现 "concurrent" softmax 函数

Trouble implementing "concurrent" softmax function from paper (PyTorch)

我正在尝试实现论文“Large-Scale Object Detection in the Wild from Imbalanced Multi-Labels”中给出的所谓 'concurrent' softmax 函数。下面是并发softmax的定义:

NOTE: I have left the (1-rij) term out for the time being because I don't think it applies to my problem given that my training dataset has a different type of labeling compared to the paper.

为了让我自己保持简单,我首先以一种非常低效但易于理解的方式使用 for 循环来实现它。但是,我得到的输出对我来说似乎是错误的。下面是我使用的代码:

# here is a one-hot encoded vector for the multi-label classification
# the image thus has 2 correct labels out of a possible 3 classes
y = [0, 1, 1]

# these are some made up logits that might come from the network.
vec = torch.tensor([0.2, 0.9, 0.7])

def concurrent_softmax(vec, y):
    for i in range(len(vec)):
        zi = torch.exp(vec[i])
        sum_over_j = 0
        for j in range(len(y)):
            sum_over_j += (1-y[j])*torch.exp(vec[j])

        out = zi / (sum_over_j + zi)
        yield out

for result in concurrent_softmax(vec, y):
    print(result)

从这个实现中我意识到,无论我给 'vec' 中的第一个 logit 赋予什么值,我总是会得到 0.5 的输出(因为它基本上总是计算 zi / (zi+zi) ).这似乎是一个主要问题,因为我预计 logits 的值会对生成的并发 softmax 值产生一些影响。那么我的实现是否有问题,或者函数的这种行为是否正确,理论上有什么我不理解的?

这是给定 y[i]=1 所有其他 i.

的预期行为

请注意,您可以使用点积来简化求和:

y = torch.tensor(y)

def concurrent_softmax(z, y):
    sum_over_j = torch.dot((torch.ones(len(y)) - y), torch.exp(z))

    for zi in z:
        numerator = torch.exp(zi)
        denominator = sum_over_j + numerator
        yield numerator / denominator