如何在给定pytorch中的向量和余弦相似性的情况下对相似向量进行采样?

How to sample similar vectors given a vector and cosine similarity in pytorch?

我有这个向量

>>> vec
tensor([[0.2677, 0.1158, 0.5954, 0.9210, 0.3622, 0.4081, 0.4477, 0.7930, 0.1161,
         0.5111, 0.2010, 0.3680, 0.1162, 0.1563, 0.4478, 0.9732, 0.7962, 0.0873,
         0.9793, 0.9382, 0.9468, 0.0851, 0.7601, 0.0322, 0.7553, 0.4025, 0.3627,
         0.5706, 0.3015, 0.1344, 0.8343, 0.8187, 0.4287, 0.5785, 0.9527, 0.1632,
         0.2890, 0.5411, 0.5319, 0.7163, 0.3166, 0.5717, 0.5018, 0.5368, 0.3321]])

我想使用这个向量生成 15 个向量,它们的余弦相似度大于 80%。

我如何在 pytorch 中执行此操作?

我修改了答案 ,添加了额外的维度并从 numpy 转换为 torch。

def torch_cos_sim(v,cos_theta,n_vectors = 1,EXACT = True):

   """
   EXACT - if True, all vectors will have exactly cos_theta similarity. 
           if False, all vectors will have >= cos_theta similarity
   v - original vector (1D tensor)
   cos_theta -cos similarity in range [-1,1]
   """

   # unit vector in direction of v
   u = v / torch.norm(v)
   u = u.unsqueeze(0).repeat(n_vectors,1)

   # random vector with elements in range [-1,1]
   r = torch.rand([n_vectors,len(v)])*2 -1 

   # unit vector perpendicular to v and u
   uperp = torch.stack([r[i] - (torch.dot(r[i],u[i]) * u[i]) for i in range(len(u))])
   uperp = uperp/ (torch.norm(uperp,dim = 1).unsqueeze(1).repeat(1,v.shape[0]))

   if not EXACT:
    cos_theta = torch.rand(n_vectors)* (1-cos_theta) + cos_theta
    cos_theta = cos_theta.unsqueeze(1).repeat(1,v.shape[0])       

   # w is the linear combination of u and uperp with coefficients costheta
   # and sin(theta) = sqrt(1 - costheta**2), respectively:
   w = cos_theta*u + torch.sqrt(1 - torch.tensor(cos_theta)**2)*uperp

   return w

您可以使用以下方法检查输出:

vec = torch.rand(54)
output = torch_cos_sim(vec,0.6,n_vectors = 15, EXACT = False)

# test cos similarity
for item in output:
    print(torch.dot(vec,item)/(torch.norm(vec)*torch.norm(item)))