矛兵的定义

Question

为了测试我对 spearmanr 和 pearsonr 的理解，我比较了两种计算 spearmanr 的方法，它们应该给出相同的结果。令人惊讶的是，结果不同。

import torch
from scipy.stats import pearsonr, spearmanr

x = torch.normal(1, 1, (10,))
y = torch.normal(1, 1, (10,))
_, x_rank = x.sort()
_, y_rank = y.sort()

print(
    spearmanr(x, y),
    pearsonr(x_rank, y_rank)
)

要重现结果，请使用以下 x 和 y。这应该给出 0.263 等级的 pearsonr 和 0.139 的 spearmanr。

x = torch.tensor([ 1.7443, -0.7889,  0.4698,  1.2080,  0.8847, -0.4490,  1.2561,  1.5188,
        -1.0031,  1.4753])
y = torch.tensor([ 1.2675,  1.8317,  1.6912, -0.2964,  2.0014,  1.1092,  2.7958,  2.6034,
        -0.0528, -2.2956])

它们为什么不同？ spearmanr不是定义为超过x和y的pearsonr吗？我错过了什么吗？

Answer 1

首先， 澄清一下，Pearson and Spearman 相关性并不相同，但是它们可以相等，例如。在完全线性关系的情况下。前者是线性关系的度量，后者是单调关系的度量。

其次， tensor sort method 不为您提供 rank 但 indices排序前的原始数据。您需要的是排序后原始数据的指数但按照原始数据的顺序 即排名.

你可以这样做：

import torch
from scipy.stats import pearsonr, spearmanr

x = torch.tensor([ 1.7443, -0.7889,  0.4698,  1.2080,  0.8847, -0.4490,  1.2561,  1.5188,
        -1.0031,  1.4753])
y = torch.tensor([ 1.2675,  1.8317,  1.6912, -0.2964,  2.0014,  1.1092,  2.7958,  2.6034,
        -0.0528, -2.2956])

x_sorted, x_raw_indx = x.sort()
y_sorted, x_raw_indx = y.sort()

x_ranked = [int((x_sorted == value).nonzero(as_tuple=True)[0]) for value in x]
y_ranked = [int((y_sorted == value).nonzero(as_tuple=True)[0]) for value in y]

spearmanr(x, y) , pearsonr(x_ranked, y_ranked)

#(SpearmanrResult(correlation=0.13939393939393938, pvalue=0.7009318849100584),
# (0.1393939393939394, 0.7009318849100588))

如评论 (@simon) 中所述，获得排名的一种更简洁的方法是：

x_ranked = x.argsort().argsort()
y_ranked = y.argsort().argsort()

矛兵的定义

Definition of spearmanr

python

statistics