使用 scipy 的二项分布
Binomial distribution using scipy
我从网络 G
中采样了一些数据,这些数据具有网络中节点度数的离散值并计算了分布。
def degree_distribution(G):
vk = dict(G.degree())
vk = list(vk.values()) # we get only the degree values
maxk = np.max(vk)
mink = np.min(min)
kvalues= np.arange(0,maxk+1) # possible values of k
Pk = np.zeros(maxk+1) # P(k)
for k in vk:
Pk[k] = Pk[k] + 1
Pk = Pk/sum(Pk) # the sum of the elements of P(k) must to be equal to one
return kvalues,Pk
调用它:
kvalues, Pk = degree_distribution(G)
dict_prob = dict(zip(kvalues,Pk))
我得到:
{0: 0.0,
1: 0.0,
2: 0.0016146393972012918,
3: 0.004843918191603875,
4: 0.011840688912809472,
5: 0.03336921420882669,
6: 0.07319698600645856,
7: 0.10764262648008611,
8: 0.15177610333692143,
9: 0.16361679224973089,
10: 0.16254036598493002,
11: 0.11679224973089343,
12: 0.08880516684607104,
13: 0.052206673842841764,
14: 0.02099031216361679,
15: 0.006996770721205597,
16: 0.003767491926803014}
如何使用 scipy
测试此采样数据的二项分布?
如果您只想知道二项式 PMF 与您的经验分布的拟合程度如何,您可以简单地执行以下操作:
import numpy as np
from scipy import stats, optimize
data = {0: 0.0,
1: 0.0,
2: 0.0016146393972012918,
3: 0.004843918191603875,
4: 0.011840688912809472,
5: 0.03336921420882669,
6: 0.07319698600645856,
7: 0.10764262648008611,
8: 0.15177610333692143,
9: 0.16361679224973089,
10: 0.16254036598493002,
11: 0.11679224973089343,
12: 0.08880516684607104,
13: 0.052206673842841764,
14: 0.02099031216361679,
15: 0.006996770721205597,
16: 0.003767491926803014}
x = np.array(list(data.keys()))
y = np.array(list(data.values()))
def binom_fit(x, n, p):
return stats.binom(n, p).pmf(x)
opt = optimize.curve_fit(binom_fit, x, y, [10, 0.5])
opt_n, opt_p = opt[0]
yhat = stats.binom(opt_n, opt_p).pmf(x)
R2 = 1 - np.sum((y - yhat)**2)/np.sum((y - y.mean())**2)
plt.plot(x, y, label="Empirical")
plt.plot(x, yhat, label="Binomial PMF")
plt.title(f"R^2 = {R2:0.4f}")
plt.legend()
给出:
编辑:
要检验经验频率遵循二项分布的预期频率的假设,您可以使用 stats.chisquare:
>>> stats.chisquare(y*opt_n, yhat*(y.sum()/yhat.sum())*opt_n)
Power_divergenceResult(statistic=0.09436186207390668, pvalue=0.9999999999999994)
请注意,这里的零假设是频率相同,因此 p-value > 0.05 将是零假设的证据。
我从网络 G
中采样了一些数据,这些数据具有网络中节点度数的离散值并计算了分布。
def degree_distribution(G):
vk = dict(G.degree())
vk = list(vk.values()) # we get only the degree values
maxk = np.max(vk)
mink = np.min(min)
kvalues= np.arange(0,maxk+1) # possible values of k
Pk = np.zeros(maxk+1) # P(k)
for k in vk:
Pk[k] = Pk[k] + 1
Pk = Pk/sum(Pk) # the sum of the elements of P(k) must to be equal to one
return kvalues,Pk
调用它:
kvalues, Pk = degree_distribution(G)
dict_prob = dict(zip(kvalues,Pk))
我得到:
{0: 0.0,
1: 0.0,
2: 0.0016146393972012918,
3: 0.004843918191603875,
4: 0.011840688912809472,
5: 0.03336921420882669,
6: 0.07319698600645856,
7: 0.10764262648008611,
8: 0.15177610333692143,
9: 0.16361679224973089,
10: 0.16254036598493002,
11: 0.11679224973089343,
12: 0.08880516684607104,
13: 0.052206673842841764,
14: 0.02099031216361679,
15: 0.006996770721205597,
16: 0.003767491926803014}
如何使用 scipy
测试此采样数据的二项分布?
如果您只想知道二项式 PMF 与您的经验分布的拟合程度如何,您可以简单地执行以下操作:
import numpy as np
from scipy import stats, optimize
data = {0: 0.0,
1: 0.0,
2: 0.0016146393972012918,
3: 0.004843918191603875,
4: 0.011840688912809472,
5: 0.03336921420882669,
6: 0.07319698600645856,
7: 0.10764262648008611,
8: 0.15177610333692143,
9: 0.16361679224973089,
10: 0.16254036598493002,
11: 0.11679224973089343,
12: 0.08880516684607104,
13: 0.052206673842841764,
14: 0.02099031216361679,
15: 0.006996770721205597,
16: 0.003767491926803014}
x = np.array(list(data.keys()))
y = np.array(list(data.values()))
def binom_fit(x, n, p):
return stats.binom(n, p).pmf(x)
opt = optimize.curve_fit(binom_fit, x, y, [10, 0.5])
opt_n, opt_p = opt[0]
yhat = stats.binom(opt_n, opt_p).pmf(x)
R2 = 1 - np.sum((y - yhat)**2)/np.sum((y - y.mean())**2)
plt.plot(x, y, label="Empirical")
plt.plot(x, yhat, label="Binomial PMF")
plt.title(f"R^2 = {R2:0.4f}")
plt.legend()
给出:
编辑: 要检验经验频率遵循二项分布的预期频率的假设,您可以使用 stats.chisquare:
>>> stats.chisquare(y*opt_n, yhat*(y.sum()/yhat.sum())*opt_n)
Power_divergenceResult(statistic=0.09436186207390668, pvalue=0.9999999999999994)
请注意,这里的零假设是频率相同,因此 p-value > 0.05 将是零假设的证据。