在没有 for 循环的情况下计算调整后的残差
Calculate adjusted residuals without a for loop
我正在阅读有关 χ^2 检验的内容,并且有一个观察值的偶然事件 table,我想根据 this guide 计算“调整后的残差”。我写了下面的代码,它完成了工作,但我想避免底部的循环。我确定有办法,但我似乎做不到:
import numpy as np
from scipy import stats
O = np.array([[21, 6, 12, 19],[20, 4, 15, 3],
[9,5, 18, 22],[2, 6, 19, 19]])
chi2, p, dof, E = stats.chi2_contingency(O)
# Adjusted residuals here
def res(o, e, rsum, csum):
return (o - e) / np.sqrt( e * (1- (e/rsum)) * (1 - (e/csum)) )
residuals = np.zeros(O.shape)
for r in range(O.shape[0]):
for c in range(O.shape[1]):
rsum = O[r].sum()
csum = O[:, c].sum()
residual = res(O[r][c], E[r][c], rsum, csum)
residuals[r][c] = residual
print(residuals)
Out[430]:
array([[ 2.10317786, -0.04575022, -2.19144827, 0.24489455],
[ 3.59372734, -0.23218785, 0.58057317, -3.82328682],
[-1.83007839, -0.34810505, 0.24583558, 1.71096716],
[-3.81533368, 0.6412914 , 1.54166511, 1.6313671 ]])
有没有什么方法可以在没有循环的情况下优雅地做到这一点?
我的解决方案:
import numpy as np
O = np.array([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]])
E = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
o_sum_0 = O.sum(axis=0) # [10,10,10,10]
o_sum_1 = O.sum(axis=1) # [4,8,12,16]
# [4,8,12,16] > (full) >
# [[4,8,12,16],[4,8,12,16],[4,8,12,16],[4,8,12,16]] > (transpose) >
# [[4,4,4,4], [8,8,8,8], [12,12,12,12], [16,16,16,16]]
rsum = np.full(O.shape, o_sum_1).T
# [10,10,10,10] > (full) >
# [[10,10,10,10],[10,10,10,10],[10,10,10,10],[10,10,10,10]]
csum = np.full(O.shape, o_sum_0)
residual = (O - E) / np.sqrt(E * (1 - (E / rsum)) * (1 - (E / csum)))
或者用一个“简单”的行:
residuals = r(O - E) / np.sqrt(E * (1 - (E / np.full(O.shape, O.sum(axis=1)).T)) * (1 - (E / np.full(O.shape, O.sum(axis=0)))))
我正在阅读有关 χ^2 检验的内容,并且有一个观察值的偶然事件 table,我想根据 this guide 计算“调整后的残差”。我写了下面的代码,它完成了工作,但我想避免底部的循环。我确定有办法,但我似乎做不到:
import numpy as np
from scipy import stats
O = np.array([[21, 6, 12, 19],[20, 4, 15, 3],
[9,5, 18, 22],[2, 6, 19, 19]])
chi2, p, dof, E = stats.chi2_contingency(O)
# Adjusted residuals here
def res(o, e, rsum, csum):
return (o - e) / np.sqrt( e * (1- (e/rsum)) * (1 - (e/csum)) )
residuals = np.zeros(O.shape)
for r in range(O.shape[0]):
for c in range(O.shape[1]):
rsum = O[r].sum()
csum = O[:, c].sum()
residual = res(O[r][c], E[r][c], rsum, csum)
residuals[r][c] = residual
print(residuals)
Out[430]:
array([[ 2.10317786, -0.04575022, -2.19144827, 0.24489455],
[ 3.59372734, -0.23218785, 0.58057317, -3.82328682],
[-1.83007839, -0.34810505, 0.24583558, 1.71096716],
[-3.81533368, 0.6412914 , 1.54166511, 1.6313671 ]])
有没有什么方法可以在没有循环的情况下优雅地做到这一点?
我的解决方案:
import numpy as np
O = np.array([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]])
E = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
o_sum_0 = O.sum(axis=0) # [10,10,10,10]
o_sum_1 = O.sum(axis=1) # [4,8,12,16]
# [4,8,12,16] > (full) >
# [[4,8,12,16],[4,8,12,16],[4,8,12,16],[4,8,12,16]] > (transpose) >
# [[4,4,4,4], [8,8,8,8], [12,12,12,12], [16,16,16,16]]
rsum = np.full(O.shape, o_sum_1).T
# [10,10,10,10] > (full) >
# [[10,10,10,10],[10,10,10,10],[10,10,10,10],[10,10,10,10]]
csum = np.full(O.shape, o_sum_0)
residual = (O - E) / np.sqrt(E * (1 - (E / rsum)) * (1 - (E / csum)))
或者用一个“简单”的行:
residuals = r(O - E) / np.sqrt(E * (1 - (E / np.full(O.shape, O.sum(axis=1)).T)) * (1 - (E / np.full(O.shape, O.sum(axis=0)))))