在没有 for 循环的情况下计算调整后的残差

Question

我正在阅读有关 χ^2 检验的内容，并且有一个观察值的偶然事件 table，我想根据 this guide 计算“调整后的残差”。我写了下面的代码，它完成了工作，但我想避免底部的循环。我确定有办法，但我似乎做不到：

import numpy as np
from scipy import stats

O = np.array([[21, 6, 12, 19],[20, 4, 15, 3],
              [9,5, 18, 22],[2, 6, 19, 19]])

chi2, p, dof, E = stats.chi2_contingency(O)

# Adjusted residuals here
def res(o, e, rsum, csum):
    return (o - e) / np.sqrt( e * (1- (e/rsum)) * (1 - (e/csum)) )

residuals = np.zeros(O.shape)

for r in range(O.shape[0]):
    for c in range(O.shape[1]):
        rsum = O[r].sum()
        csum = O[:, c].sum()
        residual = res(O[r][c], E[r][c], rsum, csum)
        residuals[r][c] = residual

print(residuals)
Out[430]: 
array([[ 2.10317786, -0.04575022, -2.19144827,  0.24489455],
       [ 3.59372734, -0.23218785,  0.58057317, -3.82328682],
       [-1.83007839, -0.34810505,  0.24583558,  1.71096716],
       [-3.81533368,  0.6412914 ,  1.54166511,  1.6313671 ]])

有没有什么方法可以在没有循环的情况下优雅地做到这一点？

Answer 1

我的解决方案：

import numpy as np

O = np.array([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]])
E = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])

o_sum_0 = O.sum(axis=0)  # [10,10,10,10]
o_sum_1 = O.sum(axis=1)  # [4,8,12,16]
 
# [4,8,12,16] > (full) > 
# [[4,8,12,16],[4,8,12,16],[4,8,12,16],[4,8,12,16]] > (transpose) >
# [[4,4,4,4], [8,8,8,8], [12,12,12,12], [16,16,16,16]]
rsum = np.full(O.shape, o_sum_1).T

# [10,10,10,10] > (full) > 
# [[10,10,10,10],[10,10,10,10],[10,10,10,10],[10,10,10,10]]
csum = np.full(O.shape, o_sum_0)

residual = (O - E) / np.sqrt(E * (1 - (E / rsum)) * (1 - (E / csum)))

或者用一个“简单”的行：

residuals = r(O - E) / np.sqrt(E * (1 - (E / np.full(O.shape, O.sum(axis=1)).T)) * (1 - (E / np.full(O.shape, O.sum(axis=0)))))

在没有 for 循环的情况下计算调整后的残差

Calculate adjusted residuals without a for loop

python

statistics

chi-squared