如何遍历相关排序列表?

How to loop over correlation sorted list?

下面是查找相关矩阵并对其进行排序的简单代码,但是如何通过获取列对名称对其进行循环?

import pandas as pd
import numpy as np

d = {
    'x1': [1, 4, 4, 5, 6], 
    'x2': [0, 0, 8, 2, 4], 
    'x3': [2, 8, 8, 10, 12], 
    'x4': [-1, -4, -4, -4, -5]
}
df = pd.DataFrame(data=d)
print(df)
print('---')
print(df.corr())
print('---')

corr_matrix = df.corr().abs()
sol = (corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool)).stack().sort_values(ascending=False))
print(sol)
print('---')

for s in sol:
    print(s)
    # how to print column 1 and 2 pair names with this "s" corr?

结果:

   x1  x2  x3  x4
0   1   0   2  -1
1   4   0   8  -4
2   4   8   8  -4
3   5   2  10  -4
4   6   4  12  -5
---
          x1        x2        x3        x4
x1  1.000000  0.399298  1.000000 -0.969248
x2  0.399298  1.000000  0.399298 -0.472866
x3  1.000000  0.399298  1.000000 -0.969248
x4 -0.969248 -0.472866 -0.969248  1.000000
---
x1  x3    1.000000
x3  x4    0.969248
x1  x4    0.969248
x2  x4    0.472866
    x3    0.399298
x1  x2    0.399298
dtype: float64
---
1.0
0.9692476431690819
0.9692476431690819
0.4728662437434603
0.39929785312496247
0.39929785312496247

我期待的是:

for (column1, column2, s) in sol:
    print(column1 + ',' + column2 + ',' + str(s))

结果:

x1, x3, 1.000000
x3, x4, 0.969248
x1, x4, 0.969248
x2, x4, 0.472866
x1, x2, 0.399298

您可以使用 DataFrame.itertuples 作为命名对迭代数据框行:

pairs = sol.reset_index().itertuples(index=False, name=None)
print('\n'.join(str(p).strip('()') for p in pairs))

或者也可以使用 Series.iteritems:

for item in sol.iteritems():
    print(str(item).replace('(', '').replace(')', ''))

结果:

'x1', 'x3', 1.0
'x3', 'x4', 0.9692476431690819
'x1', 'x4', 0.9692476431690819
'x2', 'x4', 0.4728662437434603
'x2', 'x3', 0.39929785312496247
'x1', 'x2', 0.39929785312496247

这是您要找的吗:

print(sol.reset_index())

  level_0 level_1         0
0      x1      x3  1.000000
1      x3      x4  0.969248
2      x1      x4  0.969248
3      x2      x4  0.472866
4      x2      x3  0.399298
5      x1      x2  0.399298

你很接近,你可以通过 Series.items 循环,通过 (column1, column2):

解压 MultiIndex 值
for ((column1, column2), s) in sol.items():
    print(column1 + ',' + column2 + ',' + str(s))
    
x1,x3,1.0
x3,x4,0.9692476431690819
x1,x4,0.9692476431690819
x2,x4,0.4728662437434603
x2,x3,0.39929785312496247
x1,x2,0.39929785312496247

f-strings 类似的解决方案:

for ((column1, column2), s) in sol.items():
    print( f"{column1},{column2},{s}")