NT_Xent 对比损失函数的 Tensorflow 实现?
Tensorflow implementation of NT_Xent contrastive loss function?
如标题所示,我正在尝试训练基于 SimCLR 框架的模型(见本文:https://arxiv.org/pdf/2002.05709.pdf - NT_Xent 损失在等式 (1) 和算法中说明1).
我已经设法创建了损失函数的 numpy 版本,但这不适合训练模型,因为 numpy 数组无法存储反向传播所需的信息。我很难将我的 numpy 代码转换为 Tensorflow。这是我的 numpy 版本:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Define the contrastive loss function, NT_Xent
def NT_Xent(zi, zj, tau=1):
""" Calculates the contrastive loss of the input data using NT_Xent. The
equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
Args:
zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
zj: Other half of the input data, must have the same shape as zi
tau: Temperature parameter (a constant), default = 1.
Returns:
loss: The complete NT_Xent constrastive loss
"""
z = np.concatenate((zi, zj), 0)
loss = 0
for k in range(zi.shape[0]):
# Numerator (compare i,j & j,i)
i = k
j = k + zi.shape[0]
sim_ij = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[j].reshape(1, -1)))
sim_ji = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[i].reshape(1, -1)))
numerator_ij = np.exp(sim_ij / tau)
numerator_ji = np.exp(sim_ji / tau)
# Denominator (compare i & j to all samples apart from themselves)
sim_ik = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[np.arange(z.shape[0]) != i]))
sim_jk = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[np.arange(z.shape[0]) != j]))
denominator_ik = np.sum(np.exp(sim_ik / tau))
denominator_jk = np.sum(np.exp(sim_jk / tau))
# Calculate individual and combined losses
loss_ij = - np.log(numerator_ij / denominator_ik)
loss_ji = - np.log(numerator_ji / denominator_jk)
loss += loss_ij + loss_ji
# Divide by the total number of samples
loss /= z.shape[0]
return loss
我相当有信心这个函数会产生正确的结果(尽管速度很慢,因为我在网上看到它的其他实现是矢量化版本 - 例如 Pytorch 的这个:https://github.com/Spijkervet/SimCLR/blob/master/modules/nt_xent.py(我的代码产生相同输入的相同结果),但我看不出他们的版本在数学上如何等同于论文中的公式,因此我尝试构建自己的公式)。
作为第一次尝试,我将 numpy 函数转换为它们的 TF 等价物(tf.concat、tf.reshape、tf.math.exp、tf.range 等),但我相信我的 only/main 问题是 sklearn 的 cosine_similarity 函数 returns 是一个 numpy 数组,我不知道如何在 Tensorflow 中自己构建这个函数。有什么想法吗?
我自己想出来了!
我没有意识到存在余弦相似度函数“tf.keras.losses.CosineSimilarity”
的 Tensorflow 实现
这是我的代码:
import tensorflow as tf
# Define the contrastive loss function, NT_Xent (Tensorflow version)
def NT_Xent_tf(zi, zj, tau=1):
""" Calculates the contrastive loss of the input data using NT_Xent. The
equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
(This is the Tensorflow implementation of the standard numpy version found
in the NT_Xent function).
Args:
zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
zj: Other half of the input data, must have the same shape as zi
tau: Temperature parameter (a constant), default = 1.
Returns:
loss: The complete NT_Xent constrastive loss
"""
z = tf.cast(tf.concat((zi, zj), 0), dtype=tf.float32)
loss = 0
for k in range(zi.shape[0]):
# Numerator (compare i,j & j,i)
i = k
j = k + zi.shape[0]
# Instantiate the cosine similarity loss function
cosine_sim = tf.keras.losses.CosineSimilarity(axis=-1, reduction=tf.keras.losses.Reduction.NONE)
sim = tf.squeeze(- cosine_sim(tf.reshape(z[i], (1, -1)), tf.reshape(z[j], (1, -1))))
numerator = tf.math.exp(sim / tau)
# Denominator (compare i & j to all samples apart from themselves)
sim_ik = - cosine_sim(tf.reshape(z[i], (1, -1)), z[tf.range(z.shape[0]) != i])
sim_jk = - cosine_sim(tf.reshape(z[j], (1, -1)), z[tf.range(z.shape[0]) != j])
denominator_ik = tf.reduce_sum(tf.math.exp(sim_ik / tau))
denominator_jk = tf.reduce_sum(tf.math.exp(sim_jk / tau))
# Calculate individual and combined losses
loss_ij = - tf.math.log(numerator / denominator_ik)
loss_ji = - tf.math.log(numerator / denominator_jk)
loss += loss_ij + loss_ji
# Divide by the total number of samples
loss /= z.shape[0]
return loss
如您所见,我基本上只是将 numpy 函数替换为 TF 等效函数。一个要点是我必须在“cosine_sim”函数中使用“reduction=tf.keras.losses.Reduction.NONE”,这是为了保持“sim_ik”和“sim_jk",否则产生的损失与我原来的 numpy 实现不匹配。
我还注意到单独计算 i,j 和 j,i 的分子是多余的,因为答案相同,所以我删除了该计算的一个实例。
当然,如果有人有更快的实施方式,我将非常高兴听到!
这是一个更高效、更稳定的实现。假设zi
和zj
是交错的!
class NT_Xent(tf.keras.layers.Layer):
""" Normalized temperature-scaled CrossEntropy loss [1]
[1] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” arXiv. 2020, Accessed: Jan. 15, 2021. [Online]. Available: https://github.com/google-research/simclr.
"""
def __init__(self, tau=1, **kwargs):
super().__init__(**kwargs)
self.tau = tau
self.similarity = tf.keras.losses.CosineSimilarity(axis=-1, reduction=tf.keras.losses.Reduction.NONE)
self.criterion = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
def get_config(self):
return {"tau": self.tau}
def call(self, zizj):
""" zizj is [B,N] tensor with order z_i1 z_j1 z_i2 z_j2 z_i3 z_j3 ...
batch_size is twice the original batch_size
"""
batch_size = tf.shape(zizj)[0]
mask = tf.repeat(tf.repeat(~tf.eye(batch_size/2, dtype=tf.bool), 2, axis=0), 2, axis=1)
sim = -1*self.similarity(tf.expand_dims(zizj, 1), tf.expand_dims(zizj, 0))/self.tau
sim_i_j = -1*self.similarity(zizj[0::2], zizj[1::2])/self.tau
pos = tf.reshape(tf.repeat(sim_i_j, repeats=2), (batch_size, -1))
neg = tf.reshape(sim[mask], (batch_size, -1))
logits = tf.concat((pos, neg), axis=-1)
labels = tf.one_hot(tf.zeros((batch_size,), dtype=tf.int32), depth=batch_size-1)
return self.criterion(labels, logits)
如标题所示,我正在尝试训练基于 SimCLR 框架的模型(见本文:https://arxiv.org/pdf/2002.05709.pdf - NT_Xent 损失在等式 (1) 和算法中说明1).
我已经设法创建了损失函数的 numpy 版本,但这不适合训练模型,因为 numpy 数组无法存储反向传播所需的信息。我很难将我的 numpy 代码转换为 Tensorflow。这是我的 numpy 版本:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Define the contrastive loss function, NT_Xent
def NT_Xent(zi, zj, tau=1):
""" Calculates the contrastive loss of the input data using NT_Xent. The
equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
Args:
zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
zj: Other half of the input data, must have the same shape as zi
tau: Temperature parameter (a constant), default = 1.
Returns:
loss: The complete NT_Xent constrastive loss
"""
z = np.concatenate((zi, zj), 0)
loss = 0
for k in range(zi.shape[0]):
# Numerator (compare i,j & j,i)
i = k
j = k + zi.shape[0]
sim_ij = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[j].reshape(1, -1)))
sim_ji = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[i].reshape(1, -1)))
numerator_ij = np.exp(sim_ij / tau)
numerator_ji = np.exp(sim_ji / tau)
# Denominator (compare i & j to all samples apart from themselves)
sim_ik = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[np.arange(z.shape[0]) != i]))
sim_jk = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[np.arange(z.shape[0]) != j]))
denominator_ik = np.sum(np.exp(sim_ik / tau))
denominator_jk = np.sum(np.exp(sim_jk / tau))
# Calculate individual and combined losses
loss_ij = - np.log(numerator_ij / denominator_ik)
loss_ji = - np.log(numerator_ji / denominator_jk)
loss += loss_ij + loss_ji
# Divide by the total number of samples
loss /= z.shape[0]
return loss
我相当有信心这个函数会产生正确的结果(尽管速度很慢,因为我在网上看到它的其他实现是矢量化版本 - 例如 Pytorch 的这个:https://github.com/Spijkervet/SimCLR/blob/master/modules/nt_xent.py(我的代码产生相同输入的相同结果),但我看不出他们的版本在数学上如何等同于论文中的公式,因此我尝试构建自己的公式)。
作为第一次尝试,我将 numpy 函数转换为它们的 TF 等价物(tf.concat、tf.reshape、tf.math.exp、tf.range 等),但我相信我的 only/main 问题是 sklearn 的 cosine_similarity 函数 returns 是一个 numpy 数组,我不知道如何在 Tensorflow 中自己构建这个函数。有什么想法吗?
我自己想出来了! 我没有意识到存在余弦相似度函数“tf.keras.losses.CosineSimilarity”
的 Tensorflow 实现这是我的代码:
import tensorflow as tf
# Define the contrastive loss function, NT_Xent (Tensorflow version)
def NT_Xent_tf(zi, zj, tau=1):
""" Calculates the contrastive loss of the input data using NT_Xent. The
equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
(This is the Tensorflow implementation of the standard numpy version found
in the NT_Xent function).
Args:
zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
zj: Other half of the input data, must have the same shape as zi
tau: Temperature parameter (a constant), default = 1.
Returns:
loss: The complete NT_Xent constrastive loss
"""
z = tf.cast(tf.concat((zi, zj), 0), dtype=tf.float32)
loss = 0
for k in range(zi.shape[0]):
# Numerator (compare i,j & j,i)
i = k
j = k + zi.shape[0]
# Instantiate the cosine similarity loss function
cosine_sim = tf.keras.losses.CosineSimilarity(axis=-1, reduction=tf.keras.losses.Reduction.NONE)
sim = tf.squeeze(- cosine_sim(tf.reshape(z[i], (1, -1)), tf.reshape(z[j], (1, -1))))
numerator = tf.math.exp(sim / tau)
# Denominator (compare i & j to all samples apart from themselves)
sim_ik = - cosine_sim(tf.reshape(z[i], (1, -1)), z[tf.range(z.shape[0]) != i])
sim_jk = - cosine_sim(tf.reshape(z[j], (1, -1)), z[tf.range(z.shape[0]) != j])
denominator_ik = tf.reduce_sum(tf.math.exp(sim_ik / tau))
denominator_jk = tf.reduce_sum(tf.math.exp(sim_jk / tau))
# Calculate individual and combined losses
loss_ij = - tf.math.log(numerator / denominator_ik)
loss_ji = - tf.math.log(numerator / denominator_jk)
loss += loss_ij + loss_ji
# Divide by the total number of samples
loss /= z.shape[0]
return loss
如您所见,我基本上只是将 numpy 函数替换为 TF 等效函数。一个要点是我必须在“cosine_sim”函数中使用“reduction=tf.keras.losses.Reduction.NONE”,这是为了保持“sim_ik”和“sim_jk",否则产生的损失与我原来的 numpy 实现不匹配。
我还注意到单独计算 i,j 和 j,i 的分子是多余的,因为答案相同,所以我删除了该计算的一个实例。
当然,如果有人有更快的实施方式,我将非常高兴听到!
这是一个更高效、更稳定的实现。假设zi
和zj
是交错的!
class NT_Xent(tf.keras.layers.Layer):
""" Normalized temperature-scaled CrossEntropy loss [1]
[1] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” arXiv. 2020, Accessed: Jan. 15, 2021. [Online]. Available: https://github.com/google-research/simclr.
"""
def __init__(self, tau=1, **kwargs):
super().__init__(**kwargs)
self.tau = tau
self.similarity = tf.keras.losses.CosineSimilarity(axis=-1, reduction=tf.keras.losses.Reduction.NONE)
self.criterion = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
def get_config(self):
return {"tau": self.tau}
def call(self, zizj):
""" zizj is [B,N] tensor with order z_i1 z_j1 z_i2 z_j2 z_i3 z_j3 ...
batch_size is twice the original batch_size
"""
batch_size = tf.shape(zizj)[0]
mask = tf.repeat(tf.repeat(~tf.eye(batch_size/2, dtype=tf.bool), 2, axis=0), 2, axis=1)
sim = -1*self.similarity(tf.expand_dims(zizj, 1), tf.expand_dims(zizj, 0))/self.tau
sim_i_j = -1*self.similarity(zizj[0::2], zizj[1::2])/self.tau
pos = tf.reshape(tf.repeat(sim_i_j, repeats=2), (batch_size, -1))
neg = tf.reshape(sim[mask], (batch_size, -1))
logits = tf.concat((pos, neg), axis=-1)
labels = tf.one_hot(tf.zeros((batch_size,), dtype=tf.int32), depth=batch_size-1)
return self.criterion(labels, logits)