Excel 统计:如何计算 2x2 意外事件 table 的 p-value?
Excel statisticals: How to calculate p-value of a 2x2 contingency table?
给出的数据如:
A B C
1 Group 1 Group 2
2 Property 1 56 651
3 Property 2 97 1,380
如何直接计算 p-value(即 chi-squared 分布的 "right-tail" 概率),而无需为 [= 的预期值设置单独的计算44=]?
p-value 由函数 ChiSq.dist.RT
在 Excel 中计算 如果您知道 chi-squared 的值对于 table 或 ChiSq.Test
如果您知道 "expected values" 的 table 代表 table。 chi-squared的值是用期望值计算出来的,而期望值是由原来的table通过稍微复杂的公式计算出来的,所以不管怎样,Excel都需要我们计算期望值为了获得 p-value 而重视自己,这看起来 kind-of 很愚蠢。那么,如何在不单独计算期望值的情况下得到Excel中的p-value呢?
编辑:此问题最初发布时的标题为 "How to calculate Pearson correlation coefficient with 2-property arrays?",询问函数 pearson 给出错误答案的原因。好吧,答案是我混淆了 p-value 和 Pearson 相关系数,它们是不同的东西。因此,我重新制定了问题以询问我真正需要知道的内容,并发布了答案。我会等一会儿再接受我自己的答案,以防其他人有更好的答案。
在我看来,这需要 VBA。我编写了以下 VBA 函数来计算卡方值或 p 值,以及 2x2 意外事件 table:
的其他两个关联度量
Public Function nStatAssoc_2x2(sType As String, nGrp1PropCounts As Range, nGrp2PropCounts As Range) As Single
' Return one of several measures of statistical association of a 2×2 contingency table:
' Property 1 Property 2
' Group 1 nCount(1, 1) nCount(1, 2)
' Group 2 nCount(2, 1) nCount(2, 2)
' sType is: to calculate:
' "OR" Odds ratio
' "phi" Phi coefficient
' "chi-sq" Chi-squared
' "p" p-value, i.e., right-tailed probability of the chi-squared distribution
' nGrp<n>PropCounts is a range of two cells containing the number of members of group n that have each of two properties.
' These arguments are 1-D arrays in order to allow the data to appear in non-adjacent ranges in the spreadsheet.
' References:
' Contingency table: https://en.wikipedia.org/wiki/Contingency_table
' Measure of association: www.britannica.com/topic/measure-of-association
' Odds ratio: https://en.wikipedia.org/wiki/Odds_ratio
' https://en.wikipedia.org/wiki/Effect_size#Odds_ratio
' Phi coefficient: https://en.wikipedia.org/wiki/Phi_coefficient
' Chi-sq: https://en.wikipedia.org/wiki/Pearson's_chi-squared_test#Calculating_the_test-statistic
' www.mathsisfun.com/data/chi-square-test.html
' Shows calculation of expected values.
' p-value: https://docs.microsoft.com/en-us/office/vba/api/excel.worksheetfunction.ChiSq_Dist_RT
Dim nCount(1 To 2, 1 To 2) As Integer
Dim nSumGrp(1 To 2) As Integer, nSumProp(1 To 2) As Integer, nSumAll As Integer
Dim nExpect(1 To 2, 1 To 2) As Single
Dim nIndex1 As Byte, nIndex2 As Byte
Dim nRetVal As Single
' Combine input arguments into contingency table:
For nIndex1 = 1 To 2
nCount(1, nIndex1) = nGrp1PropCounts(nIndex1)
nCount(2, nIndex1) = nGrp2PropCounts(nIndex1)
Next nIndex1
' Calculate totals of group counts, property counts, and all counts (used for phi and chi-sq):
For nIndex1 = 1 To 2
For nIndex2 = 1 To 2
nSumGrp(nIndex1) = nSumGrp(nIndex1) + nCount(nIndex1, nIndex2)
nSumProp(nIndex2) = nSumProp(nIndex2) + nCount(nIndex1, nIndex2)
Next nIndex2
Next nIndex1
nSumAll = nSumGrp(1) + nSumGrp(2)
If nSumAll <> nSumProp(1) + nSumProp(2) Then
nRetVal = -2 ' Error: Sums differ.
GoTo Finished
End If
Select Case sType
' Odds ratio
Case "OR":
nRetVal = (nCount(1, 1) / nCount(1, 2)) / (nCount(2, 1) / nCount(2, 2))
If nRetVal <> (nCount(1, 1) / nCount(2, 1)) / (nCount(1, 2) / nCount(2, 2)) Then
nRetVal = -3 ' Error: OR calculation results differ.
GoTo Finished
End If
' Phi coefficient
Case "phi":
nRetVal = ((CLng(nCount(1, 1)) * nCount(2, 2)) - (CLng(nCount(1, 2)) * nCount(2, 1))) / _
(CSng(nSumGrp(1)) * nSumGrp(2) * nSumProp(1) * nSumProp(2)) ^ 0.5
' Chi-squared
Case "chi-sq", "p": ' For "p", nRetVal is passed to the next select case statement.
' Calculate table of expected values:
For nIndex1 = 1 To 2
For nIndex2 = 1 To 2
' In next line, the division is done first to prevent integer overflow,
' which can happen if the multiplication is done first.
nExpect(nIndex1, nIndex2) = nSumGrp(nIndex1) / nSumAll * nSumProp(nIndex2)
If nExpect(nIndex1, nIndex2) < 5 Then
' https://en.wikipedia.org/wiki/Pearson's_chi-squared_test#Assumptions
nRetVal = -4 ' Error: Expected value too small.
GoTo Finished
Else
nRetVal = nRetVal + _
(nCount(nIndex1, nIndex2) - nExpect(nIndex1, nIndex2)) ^ 2 / nExpect(nIndex1, nIndex2)
End If
Next nIndex2
Next nIndex1
Case Else:
nRetVal = -1 ' Error: Invalid measure type.
GoTo Finished
End Select
Select Case sType
Case "OR", "phi", "chi-sq":
' p-value ' Uses value of nRetVal passed from the previous select case statement.
Case "p": nRetVal = WorksheetFunction.ChiSq_Dist_RT(nRetVal, 1)
End Select
Finished: nStatAssoc_2x2 = nRetVal
End Function ' nStatAssoc_2x2()
该函数在 Excel 2019 年进行了测试,并为几个测试 table 的所有四个度量生成了正确的值。欢迎对代码提出批评或改进建议。
如果我错了,这不需要 VBA 或出于任何其他原因有更好的方法,请 post 给出不同的答案。正如我在问题的编辑说明中所说,我会等一会儿再接受我的答案,看看是否有人有更好的答案。
给出的数据如:
A B C
1 Group 1 Group 2
2 Property 1 56 651
3 Property 2 97 1,380
如何直接计算 p-value(即 chi-squared 分布的 "right-tail" 概率),而无需为 [= 的预期值设置单独的计算44=]?
p-value 由函数 ChiSq.dist.RT
在 Excel 中计算 如果您知道 chi-squared 的值对于 table 或 ChiSq.Test
如果您知道 "expected values" 的 table 代表 table。 chi-squared的值是用期望值计算出来的,而期望值是由原来的table通过稍微复杂的公式计算出来的,所以不管怎样,Excel都需要我们计算期望值为了获得 p-value 而重视自己,这看起来 kind-of 很愚蠢。那么,如何在不单独计算期望值的情况下得到Excel中的p-value呢?
编辑:此问题最初发布时的标题为 "How to calculate Pearson correlation coefficient with 2-property arrays?",询问函数 pearson 给出错误答案的原因。好吧,答案是我混淆了 p-value 和 Pearson 相关系数,它们是不同的东西。因此,我重新制定了问题以询问我真正需要知道的内容,并发布了答案。我会等一会儿再接受我自己的答案,以防其他人有更好的答案。
在我看来,这需要 VBA。我编写了以下 VBA 函数来计算卡方值或 p 值,以及 2x2 意外事件 table:
的其他两个关联度量Public Function nStatAssoc_2x2(sType As String, nGrp1PropCounts As Range, nGrp2PropCounts As Range) As Single
' Return one of several measures of statistical association of a 2×2 contingency table:
' Property 1 Property 2
' Group 1 nCount(1, 1) nCount(1, 2)
' Group 2 nCount(2, 1) nCount(2, 2)
' sType is: to calculate:
' "OR" Odds ratio
' "phi" Phi coefficient
' "chi-sq" Chi-squared
' "p" p-value, i.e., right-tailed probability of the chi-squared distribution
' nGrp<n>PropCounts is a range of two cells containing the number of members of group n that have each of two properties.
' These arguments are 1-D arrays in order to allow the data to appear in non-adjacent ranges in the spreadsheet.
' References:
' Contingency table: https://en.wikipedia.org/wiki/Contingency_table
' Measure of association: www.britannica.com/topic/measure-of-association
' Odds ratio: https://en.wikipedia.org/wiki/Odds_ratio
' https://en.wikipedia.org/wiki/Effect_size#Odds_ratio
' Phi coefficient: https://en.wikipedia.org/wiki/Phi_coefficient
' Chi-sq: https://en.wikipedia.org/wiki/Pearson's_chi-squared_test#Calculating_the_test-statistic
' www.mathsisfun.com/data/chi-square-test.html
' Shows calculation of expected values.
' p-value: https://docs.microsoft.com/en-us/office/vba/api/excel.worksheetfunction.ChiSq_Dist_RT
Dim nCount(1 To 2, 1 To 2) As Integer
Dim nSumGrp(1 To 2) As Integer, nSumProp(1 To 2) As Integer, nSumAll As Integer
Dim nExpect(1 To 2, 1 To 2) As Single
Dim nIndex1 As Byte, nIndex2 As Byte
Dim nRetVal As Single
' Combine input arguments into contingency table:
For nIndex1 = 1 To 2
nCount(1, nIndex1) = nGrp1PropCounts(nIndex1)
nCount(2, nIndex1) = nGrp2PropCounts(nIndex1)
Next nIndex1
' Calculate totals of group counts, property counts, and all counts (used for phi and chi-sq):
For nIndex1 = 1 To 2
For nIndex2 = 1 To 2
nSumGrp(nIndex1) = nSumGrp(nIndex1) + nCount(nIndex1, nIndex2)
nSumProp(nIndex2) = nSumProp(nIndex2) + nCount(nIndex1, nIndex2)
Next nIndex2
Next nIndex1
nSumAll = nSumGrp(1) + nSumGrp(2)
If nSumAll <> nSumProp(1) + nSumProp(2) Then
nRetVal = -2 ' Error: Sums differ.
GoTo Finished
End If
Select Case sType
' Odds ratio
Case "OR":
nRetVal = (nCount(1, 1) / nCount(1, 2)) / (nCount(2, 1) / nCount(2, 2))
If nRetVal <> (nCount(1, 1) / nCount(2, 1)) / (nCount(1, 2) / nCount(2, 2)) Then
nRetVal = -3 ' Error: OR calculation results differ.
GoTo Finished
End If
' Phi coefficient
Case "phi":
nRetVal = ((CLng(nCount(1, 1)) * nCount(2, 2)) - (CLng(nCount(1, 2)) * nCount(2, 1))) / _
(CSng(nSumGrp(1)) * nSumGrp(2) * nSumProp(1) * nSumProp(2)) ^ 0.5
' Chi-squared
Case "chi-sq", "p": ' For "p", nRetVal is passed to the next select case statement.
' Calculate table of expected values:
For nIndex1 = 1 To 2
For nIndex2 = 1 To 2
' In next line, the division is done first to prevent integer overflow,
' which can happen if the multiplication is done first.
nExpect(nIndex1, nIndex2) = nSumGrp(nIndex1) / nSumAll * nSumProp(nIndex2)
If nExpect(nIndex1, nIndex2) < 5 Then
' https://en.wikipedia.org/wiki/Pearson's_chi-squared_test#Assumptions
nRetVal = -4 ' Error: Expected value too small.
GoTo Finished
Else
nRetVal = nRetVal + _
(nCount(nIndex1, nIndex2) - nExpect(nIndex1, nIndex2)) ^ 2 / nExpect(nIndex1, nIndex2)
End If
Next nIndex2
Next nIndex1
Case Else:
nRetVal = -1 ' Error: Invalid measure type.
GoTo Finished
End Select
Select Case sType
Case "OR", "phi", "chi-sq":
' p-value ' Uses value of nRetVal passed from the previous select case statement.
Case "p": nRetVal = WorksheetFunction.ChiSq_Dist_RT(nRetVal, 1)
End Select
Finished: nStatAssoc_2x2 = nRetVal
End Function ' nStatAssoc_2x2()
该函数在 Excel 2019 年进行了测试,并为几个测试 table 的所有四个度量生成了正确的值。欢迎对代码提出批评或改进建议。
如果我错了,这不需要 VBA 或出于任何其他原因有更好的方法,请 post 给出不同的答案。正如我在问题的编辑说明中所说,我会等一会儿再接受我的答案,看看是否有人有更好的答案。