计算类型之间的相关矩阵

Calculate correlation matrix between types

我有数据框 df,其中包括如下 3 列(制表符分隔):

X    Y    types
0.3422    0.3214    pen
-0.1784    0.8621    pen
0.9932    0.1347    pencil
0.2847    -0.7634   pen
-0.6548    -0.2981    ruler
0.4792    0.3782    pencil
0.9231    -0.2949    ruler

输出将是这样的相关矩阵:

      pen    pencil    ruler

pen    C1      C2        C3

pencil C4      C5        C6

ruler  C7      C8        C9

我试过 .corr() 但它不能正常工作 df 的结构

注:C1为笔-笔之间的相关值,C2为笔-铅笔之间的相关值,依此类推

有什么帮助吗?

IIUC,你可以这样做:

res = df.groupby('types').mean().T.corr()

输出

types   pen  pencil  ruler
types                     
pen     1.0     1.0    1.0
pencil  1.0     1.0    1.0
ruler   1.0     1.0    1.0

您可以根据需要更改关联方法,例如:

import numpy as np
res = df.groupby('types').mean().T.corr(method=np.dot)
print(res)

输出

types        pen    pencil     ruler
types                               
pen     1.000000  0.145973 -0.021464
pencil  0.145973  1.000000  0.022724
ruler  -0.021464  0.022724  1.000000

默认方法将是皮尔逊相关,来自 上的 documentation 方法 :

method{‘pearson’, ‘kendall’, ‘spearman’} or callable Method of correlation:

pearson : standard correlation coefficient

kendall : Kendall Tau correlation coefficient

spearman : Spearman rank correlation

callable: callable with input two 1d ndarrays and returning a float. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.

New in version 0.24.0.