试图排除占星术,但出了点问题
Trying to rule out astrology but something is wrong
我试图排除占星术可能对人口产生的影响,因为这种影响在统计上微不足道,但无济于事。我正在对来自两个不同人群的太阳星座的两个分布使用 Pearson 的卡方检验,其中一个是宇航员飞行员,另一个是名人。一定是哪里出了问题,但我没有找到,可能是在统计方面。
import numpy as np
import pandas as pd
import ephem
from collections import Counter, namedtuple
import matplotlib.pyplot as plt
from scipy import stats
models = pd.read_csv('models.csv', delimiter=',')
astronauts = pd.read_csv('astronauts.csv', delimiter=',')
models = models.sample(229)
astronauts = astronauts.sample(229)
sun = ephem.Sun()
def get_planet_constellation(planet, dataset):
person_planet_constellation = []
for person in dataset['Birth Date']:
planet.compute(person)
person_planet_constellation += [ephem.constellation(planet)[1]]
return person_planet_constellation
def plot_bar_group(planet, data1, data2):
fig, ax = plt.subplots()
plt.bar(data1.keys(), data1.values(), alpha=0.5)
plt.bar(data2.keys(), data2.values(), alpha=0.5)
plt.legend(['astronauts', 'models'])
ylabel = 'Percentages of ' + planet.name + ' in constellation'
ax.set_ylabel(ylabel)
title = 'Histogram of ' + planet.name + ' in constellation by group'
ax.set_title(title)
plt.show()
astronaut_sun_constellation = Counter(
get_planet_constellation(sun, astronauts))
model_sun_constellation = Counter(get_planet_constellation(sun, models))
plot_bar_group(sun, astronaut_sun_constellation, model_sun_constellation)
a = list(astronaut_sun_constellation.values())
b = list(model_sun_constellation.values())
s = np.array([a, b])
stat, p, dof, expected = stats.chi2_contingency(s)
print(stat, p, dof, expected)
prob = 0.95
critical = stats.chi2.ppf(prob, dof)
if abs(stat) >= critical:
print('Dependent (reject H0)')
else:
print('Independent (fail to reject H0)')
# interpret p-value
alpha = 1.0 - prob
if p <= alpha:
print('Dependent (reject H0)')
else:
print('Independent (fail to reject H0)')
https://www.dropbox.com/s/w7rye6m5lbihjlh/astronauts.csv
https://www.dropbox.com/s/xlxanr0pxqtxcvv/models.csv
我最终找到了这个错误,它是在将计数器作为列表传递给 chisquare 函数时,必须首先对其进行排序,否则 chisquare 会发现计数器值存在重大差异。所有占星术效果现在都如预期的那样微不足道,达到 0.95
的水平
我试图排除占星术可能对人口产生的影响,因为这种影响在统计上微不足道,但无济于事。我正在对来自两个不同人群的太阳星座的两个分布使用 Pearson 的卡方检验,其中一个是宇航员飞行员,另一个是名人。一定是哪里出了问题,但我没有找到,可能是在统计方面。
import numpy as np
import pandas as pd
import ephem
from collections import Counter, namedtuple
import matplotlib.pyplot as plt
from scipy import stats
models = pd.read_csv('models.csv', delimiter=',')
astronauts = pd.read_csv('astronauts.csv', delimiter=',')
models = models.sample(229)
astronauts = astronauts.sample(229)
sun = ephem.Sun()
def get_planet_constellation(planet, dataset):
person_planet_constellation = []
for person in dataset['Birth Date']:
planet.compute(person)
person_planet_constellation += [ephem.constellation(planet)[1]]
return person_planet_constellation
def plot_bar_group(planet, data1, data2):
fig, ax = plt.subplots()
plt.bar(data1.keys(), data1.values(), alpha=0.5)
plt.bar(data2.keys(), data2.values(), alpha=0.5)
plt.legend(['astronauts', 'models'])
ylabel = 'Percentages of ' + planet.name + ' in constellation'
ax.set_ylabel(ylabel)
title = 'Histogram of ' + planet.name + ' in constellation by group'
ax.set_title(title)
plt.show()
astronaut_sun_constellation = Counter(
get_planet_constellation(sun, astronauts))
model_sun_constellation = Counter(get_planet_constellation(sun, models))
plot_bar_group(sun, astronaut_sun_constellation, model_sun_constellation)
a = list(astronaut_sun_constellation.values())
b = list(model_sun_constellation.values())
s = np.array([a, b])
stat, p, dof, expected = stats.chi2_contingency(s)
print(stat, p, dof, expected)
prob = 0.95
critical = stats.chi2.ppf(prob, dof)
if abs(stat) >= critical:
print('Dependent (reject H0)')
else:
print('Independent (fail to reject H0)')
# interpret p-value
alpha = 1.0 - prob
if p <= alpha:
print('Dependent (reject H0)')
else:
print('Independent (fail to reject H0)')
https://www.dropbox.com/s/w7rye6m5lbihjlh/astronauts.csv https://www.dropbox.com/s/xlxanr0pxqtxcvv/models.csv
我最终找到了这个错误,它是在将计数器作为列表传递给 chisquare 函数时,必须首先对其进行排序,否则 chisquare 会发现计数器值存在重大差异。所有占星术效果现在都如预期的那样微不足道,达到 0.95
的水平