如何从布尔值的 Pandas DataFrame 创建频率分布矩阵

Question

简而言之，我正在尝试像这样翻译 DataFrame

Patient   Cough   Headache   Dizzy
   1        1         0        0 
   2        1         1        1
   3        0         1        0 
   4        1         0        1
   5        0         1        0

进入类似于Pandas相关特征的频率分布矩阵。

也就是说，它会return像这样

        Cough   Headache   Dizzy
Cough     1       0.33     0.66
Headache 0.33       1      0.33
Dizzy     1       0.5       1

因为每 3 个头痛的人中就有 1 个是头晕的，但只有每 2 个头晕的人中有 1 个是头痛等

我想使用它的实际数据要大得多，所以我很好奇 Pandas 是否有办法自动执行此操作。

Answer 1

是这样的吗？

# extract columns of interest
s = df.iloc[:,1:]

# output
((s.T @ s)/s.sum()).T

输出：

             Cough  Headache     Dizzy
Cough     1.000000  0.333333  0.666667
Headache  0.333333  1.000000  0.333333
Dizzy     1.000000  0.500000  1.000000

如何从布尔值的 Pandas DataFrame 创建频率分布矩阵

How to create a Frequency Distribution Matrix from a Pandas DataFrame of boolian values

python

frequency-distribution

frequency-analysis

pandas