Python 分布的卡方检验

Question

我对这个练习有两个疑惑：

代码的第一部分完美运行。现在我需要使用卡方检验检查分布是否平坦。

我实现的代码是：

#UNIFORM RANDOM SAMPLING 

import numpy as np                #library needed for numerical calculations
import matplotlib.pyplot as plt   #library needed for plotting purposes
from scipy.stats import chisquare #function needed for chi square test

#*******************************************************************************

i=np.uintc(987654321)              #unsigned int variable i
m=10**3                            #number of 10**3 events

list1=[i]                          #list1 needed to be updated with random i

for count in range(m):             #for cycle over expected period and update i
    i=np.uintc(i*663608941)
    list1.append(i) 

list1=np.divide(list1,(2**32)-1)    #needed in order to normalize the list1 elements
bins1=int(np.sqrt(m))               #histogram bin numbers

[hist,bin_edges]=np.histogram(list1,bins=bins1) #compute the histogram of a dataset

#*******************************************************************************

f_exp=(m/bins1)*np.ones(bins1)      #expected frequency, expresses in array form. 
                                    #we define an array of ones of the exact size 
                                    #as the number of bins, and then just multiply 
                                    #it with n/N where n is number of elements
                                    #and N is number of bins. 
                                    #So it will look like [n/N,n/N,n/N...]

chisquareval=chisquare(hist,f_exp,axis=0)        #Calculate a one-way chi-square test.
                                                 #The chi-square test tests the null hypothesis 
                                                 #that the categorical data has the given frequencies.
                                                 #It needs: Observed frequencies in each category, Expected frequencies in each category

print("\n")
print("The result of chi squared test is:", chisquareval, "\n")

#*******************************************************************************

plt.figure()                                                          #a unique identifier for the figure                                                    
plt.hist(list1[0:m],bins=bins1)                                       #compute and draw the histogram of x with n bins
plt.grid()                                                            #configure the grid lines
plt.xlabel('Bins',fontweight='bold')                                  #set the label for the y-axis 
plt.ylabel('Frequency',fontweight='bold')                             #set the label for the y-axis 
plt.title('Uniform distribution: number of elements ^{3}$')        #set a title for the hist
plt.show()                                                            #display all open figures

#*******************************************************************************

i=np.uintc(987654321)              #unsigned int variable i

n=10**6                            #number of 10**6 events

list1=[i]                          #list1 needed to be updated with random i

for count in range(n):             #for cycle over expected period and update i
    i=np.uintc(i*663608941)
    list1.append(i) 
    
list1=np.divide(list1,(2**32)-1)    #needed in order to normalize the list1 elements
bins1=int(np.sqrt(n))               #histogram bin numbers

[hist,bin_edges]=np.histogram(list1,bins=bins1) #compute the histogram of a dataset

#*******************************************************************************

f_exp=(n/bins1)*np.ones(bins1)      #expected frequency, expresses in array form. 
                                    #we define an array of ones of the exact size 
                                    #as the number of bins, and then just multiply 
                                    #it with n/N where n is number of elements
                                    #and N is number of bins. 
                                    #So it will look like [n/N,n/N,n/N...]

chisquareval=chisquare(hist,f_exp,axis=0)        #Calculate a one-way chi-square test.
                                                 #The chi-square test tests the null hypothesis 
                                                 #that the categorical data has the given frequencies.
                                                 #It needs: Observed frequencies in each category, Expected frequencies in each category

print("\n")
print("The result of chi squared test is:", chisquareval, "\n")

#*******************************************************************************

plt.figure()                                                          #a unique identifier for the figure
plt.hist(list1[0:n],bins=bins1)                                       #compute and draw the histogram of x with n bins 
plt.grid()                                                            #configure the grid lines
plt.xlabel('Bins',fontweight='bold')                                  #set the label for the x-axis
plt.ylabel('Frequency',fontweight='bold')                             #set the label for the y-axis   
plt.title('Uniform distribution: number of elements ^{6}$')        #set a title for the hist
plt.show()                                                            #display all open figures

#*******************************************************************************

具有以下输出：

对于第一个直方图，一切正常，有效。在第二个直方图中，我们可以看到一个错误的统计值。我过去运行过该代码，结果是 103，但我没有更改代码！

为什么会这样？
为什么 chisquare 的显示输出看起来很糟糕？

Power_divergenceResult(statistic=32.315000000000005, pvalue=0.35302378840079285)

可以分开打印statistc和pvalue吗？

Answer 1

我明白这个问题了。代码是正确的，但如果我没有找到 ddof=1000 的 table，我无法判断原假设是否正确。所以得到 bin=ddof=100 我可以比较卡方检验的结果，在这种情况下我可以说假设被拒绝，错误为 2%。

Python 分布的卡方检验

Chi squared test of a distribution in Python

python

chi-squared