NT_Xent 对比损失函数的 Tensorflow 实现？

Question

如标题所示，我正在尝试训练基于 SimCLR 框架的模型（见本文：https://arxiv.org/pdf/2002.05709.pdf - NT_Xent 损失在等式 (1) 和算法中说明1).

我已经设法创建了损失函数的 numpy 版本，但这不适合训练模型，因为 numpy 数组无法存储反向传播所需的信息。我很难将我的 numpy 代码转换为 Tensorflow。这是我的 numpy 版本：

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Define the contrastive loss function, NT_Xent
def NT_Xent(zi, zj, tau=1):
    """ Calculates the contrastive loss of the input data using NT_Xent. The
    equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
    
    Args:
        zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
        zj: Other half of the input data, must have the same shape as zi
        tau: Temperature parameter (a constant), default = 1.

    Returns:
        loss: The complete NT_Xent constrastive loss
    """
    z = np.concatenate((zi, zj), 0)

    loss = 0
    for k in range(zi.shape[0]):
        # Numerator (compare i,j & j,i)
        i = k
        j = k + zi.shape[0]
        sim_ij = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[j].reshape(1, -1)))
        sim_ji = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[i].reshape(1, -1)))
        numerator_ij = np.exp(sim_ij / tau)
        numerator_ji = np.exp(sim_ji / tau)

        # Denominator (compare i & j to all samples apart from themselves)
        sim_ik = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[np.arange(z.shape[0]) != i]))
        sim_jk = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[np.arange(z.shape[0]) != j]))
        denominator_ik = np.sum(np.exp(sim_ik / tau))
        denominator_jk = np.sum(np.exp(sim_jk / tau))

        # Calculate individual and combined losses
        loss_ij = - np.log(numerator_ij / denominator_ik)
        loss_ji = - np.log(numerator_ji / denominator_jk)
        loss += loss_ij + loss_ji
    
    # Divide by the total number of samples
    loss /= z.shape[0]

    return loss

我相当有信心这个函数会产生正确的结果（尽管速度很慢，因为我在网上看到它的其他实现是矢量化版本 - 例如 Pytorch 的这个：https://github.com/Spijkervet/SimCLR/blob/master/modules/nt_xent.py（我的代码产生相同输入的相同结果），但我看不出他们的版本在数学上如何等同于论文中的公式，因此我尝试构建自己的公式）。

作为第一次尝试，我将 numpy 函数转换为它们的 TF 等价物（tf.concat、tf.reshape、tf.math.exp、tf.range 等），但我相信我的 only/main 问题是 sklearn 的 cosine_similarity 函数 returns 是一个 numpy 数组，我不知道如何在 Tensorflow 中自己构建这个函数。有什么想法吗？

Answer 1

我自己想出来了！我没有意识到存在余弦相似度函数“tf.keras.losses.CosineSimilarity”

的 Tensorflow 实现

这是我的代码：

import tensorflow as tf

# Define the contrastive loss function, NT_Xent (Tensorflow version)
def NT_Xent_tf(zi, zj, tau=1):
    """ Calculates the contrastive loss of the input data using NT_Xent. The
    equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
    (This is the Tensorflow implementation of the standard numpy version found
    in the NT_Xent function).
    
    Args:
        zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
        zj: Other half of the input data, must have the same shape as zi
        tau: Temperature parameter (a constant), default = 1.

    Returns:
        loss: The complete NT_Xent constrastive loss
    """
    z = tf.cast(tf.concat((zi, zj), 0), dtype=tf.float32)
    loss = 0
    for k in range(zi.shape[0]):
        # Numerator (compare i,j & j,i)
        i = k
        j = k + zi.shape[0]
        # Instantiate the cosine similarity loss function
        cosine_sim = tf.keras.losses.CosineSimilarity(axis=-1, reduction=tf.keras.losses.Reduction.NONE)
        sim = tf.squeeze(- cosine_sim(tf.reshape(z[i], (1, -1)), tf.reshape(z[j], (1, -1))))
        numerator = tf.math.exp(sim / tau)

        # Denominator (compare i & j to all samples apart from themselves)
        sim_ik = - cosine_sim(tf.reshape(z[i], (1, -1)), z[tf.range(z.shape[0]) != i])
        sim_jk = - cosine_sim(tf.reshape(z[j], (1, -1)), z[tf.range(z.shape[0]) != j])
        denominator_ik = tf.reduce_sum(tf.math.exp(sim_ik / tau))
        denominator_jk = tf.reduce_sum(tf.math.exp(sim_jk / tau))

        # Calculate individual and combined losses
        loss_ij = - tf.math.log(numerator / denominator_ik)
        loss_ji = - tf.math.log(numerator / denominator_jk)
        loss += loss_ij + loss_ji
    
    # Divide by the total number of samples
    loss /= z.shape[0]

    return loss

如您所见，我基本上只是将 numpy 函数替换为 TF 等效函数。一个要点是我必须在“cosine_sim”函数中使用“reduction=tf.keras.losses.Reduction.NONE”，这是为了保持“sim_ik”和“sim_jk"，否则产生的损失与我原来的 numpy 实现不匹配。

我还注意到单独计算 i,j 和 j,i 的分子是多余的，因为答案相同，所以我删除了该计算的一个实例。

当然，如果有人有更快的实施方式，我将非常高兴听到！

Answer 2

这是一个更高效、更稳定的实现。假设zi和zj是交错的！

class NT_Xent(tf.keras.layers.Layer):
    """ Normalized temperature-scaled CrossEntropy loss [1]
        [1] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” arXiv. 2020, Accessed: Jan. 15, 2021. [Online]. Available: https://github.com/google-research/simclr.
    """
    def __init__(self, tau=1, **kwargs):
        super().__init__(**kwargs)
        self.tau = tau
        self.similarity = tf.keras.losses.CosineSimilarity(axis=-1, reduction=tf.keras.losses.Reduction.NONE)
        self.criterion = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
    def get_config(self):
        return {"tau": self.tau}
    def call(self, zizj):
        """ zizj is [B,N] tensor with order z_i1 z_j1 z_i2 z_j2 z_i3 z_j3 ... 
            batch_size is twice the original batch_size
        """
        batch_size = tf.shape(zizj)[0]
        mask = tf.repeat(tf.repeat(~tf.eye(batch_size/2, dtype=tf.bool), 2, axis=0), 2, axis=1)

        sim = -1*self.similarity(tf.expand_dims(zizj, 1), tf.expand_dims(zizj, 0))/self.tau
        sim_i_j = -1*self.similarity(zizj[0::2], zizj[1::2])/self.tau

        pos = tf.reshape(tf.repeat(sim_i_j, repeats=2), (batch_size, -1))
        neg = tf.reshape(sim[mask], (batch_size, -1))

        logits = tf.concat((pos, neg), axis=-1)
        labels = tf.one_hot(tf.zeros((batch_size,), dtype=tf.int32), depth=batch_size-1)

        return self.criterion(labels, logits)

来源：https://github.com/gabriel-vanzandycke/tf_layers

NT_Xent 对比损失函数的 Tensorflow 实现？

Tensorflow implementation of NT_Xent contrastive loss function?

python

backpropagation

cosine-similarity

scikit-learn

tensorflow