Python 分布的卡方检验

Chi squared test of a distribution in Python

我对这个练习有两个疑惑:

代码的第一部分完美运行。现在我需要使用卡方检验检查分布是否平坦。

我实现的代码是:

#UNIFORM RANDOM SAMPLING 

import numpy as np                #library needed for numerical calculations
import matplotlib.pyplot as plt   #library needed for plotting purposes
from scipy.stats import chisquare #function needed for chi square test

#*******************************************************************************

i=np.uintc(987654321)              #unsigned int variable i
m=10**3                            #number of 10**3 events

list1=[i]                          #list1 needed to be updated with random i

for count in range(m):             #for cycle over expected period and update i
    i=np.uintc(i*663608941)
    list1.append(i) 

list1=np.divide(list1,(2**32)-1)    #needed in order to normalize the list1 elements
bins1=int(np.sqrt(m))               #histogram bin numbers

[hist,bin_edges]=np.histogram(list1,bins=bins1) #compute the histogram of a dataset

#*******************************************************************************

f_exp=(m/bins1)*np.ones(bins1)      #expected frequency, expresses in array form. 
                                    #we define an array of ones of the exact size 
                                    #as the number of bins, and then just multiply 
                                    #it with n/N where n is number of elements
                                    #and N is number of bins. 
                                    #So it will look like [n/N,n/N,n/N...]

chisquareval=chisquare(hist,f_exp,axis=0)        #Calculate a one-way chi-square test.
                                                 #The chi-square test tests the null hypothesis 
                                                 #that the categorical data has the given frequencies.
                                                 #It needs: Observed frequencies in each category, Expected frequencies in each category

print("\n")
print("The result of chi squared test is:", chisquareval, "\n")

#*******************************************************************************

plt.figure()                                                          #a unique identifier for the figure                                                    
plt.hist(list1[0:m],bins=bins1)                                       #compute and draw the histogram of x with n bins
plt.grid()                                                            #configure the grid lines
plt.xlabel('Bins',fontweight='bold')                                  #set the label for the y-axis 
plt.ylabel('Frequency',fontweight='bold')                             #set the label for the y-axis 
plt.title('Uniform distribution: number of elements ^{3}$')        #set a title for the hist
plt.show()                                                            #display all open figures

#*******************************************************************************

i=np.uintc(987654321)              #unsigned int variable i

n=10**6                            #number of 10**6 events

list1=[i]                          #list1 needed to be updated with random i

for count in range(n):             #for cycle over expected period and update i
    i=np.uintc(i*663608941)
    list1.append(i) 
    
list1=np.divide(list1,(2**32)-1)    #needed in order to normalize the list1 elements
bins1=int(np.sqrt(n))               #histogram bin numbers

[hist,bin_edges]=np.histogram(list1,bins=bins1) #compute the histogram of a dataset

#*******************************************************************************

f_exp=(n/bins1)*np.ones(bins1)      #expected frequency, expresses in array form. 
                                    #we define an array of ones of the exact size 
                                    #as the number of bins, and then just multiply 
                                    #it with n/N where n is number of elements
                                    #and N is number of bins. 
                                    #So it will look like [n/N,n/N,n/N...]

chisquareval=chisquare(hist,f_exp,axis=0)        #Calculate a one-way chi-square test.
                                                 #The chi-square test tests the null hypothesis 
                                                 #that the categorical data has the given frequencies.
                                                 #It needs: Observed frequencies in each category, Expected frequencies in each category

print("\n")
print("The result of chi squared test is:", chisquareval, "\n")

#*******************************************************************************

plt.figure()                                                          #a unique identifier for the figure
plt.hist(list1[0:n],bins=bins1)                                       #compute and draw the histogram of x with n bins 
plt.grid()                                                            #configure the grid lines
plt.xlabel('Bins',fontweight='bold')                                  #set the label for the x-axis
plt.ylabel('Frequency',fontweight='bold')                             #set the label for the y-axis   
plt.title('Uniform distribution: number of elements ^{6}$')        #set a title for the hist
plt.show()                                                            #display all open figures

#*******************************************************************************

具有以下输出:

对于第一个直方图,一切正常,有效。在第二个直方图中,我们可以看到一个错误的统计值。我过去运行过该代码,结果是 103,但我没有更改代码!

Power_divergenceResult(statistic=32.315000000000005, pvalue=0.35302378840079285) 

可以分开打印statistc和pvalue吗?

我明白这个问题了。代码是正确的,但如果我没有找到 ddof=1000 的 table,我无法判断原假设是否正确。所以得到 bin=ddof=100 我可以比较卡方检验的结果,在这种情况下我可以说假设被拒绝,错误为 2%。