Python:实现均值的均值 95% 置信区间?
Python: Implement mean of means 95% Confidence Interval?
怎么可能this solution be implemented using pandas/python? This question concerns the implementation of finding a 95% CI around a mean of means using this stats.stackexchange solution。
import pandas as pd
from IPython.display import display
import scipy
import scipy.stats as st
import scikits.bootstrap as bootstraps
data = pd.DataFrame({
"exp1":[34, 41, 39]
,"exp2":[45, 51, 52]
,"exp3":[29, 31, 35]
}).T
data.loc[:,"row_mean"] = data.mean(axis=1)
data.loc[:,"row_std"] = data.std(axis=1)
display(data)
<table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>0</th> <th>1</th> <th>2</th> <th>row_mean</th> <th>row_std</th> </tr> </thead> <tbody> <tr> <th>exp1</th> <td>34</td> <td>41</td> <td>39</td> <td>38.000000</td> <td>2.943920</td> </tr> <tr> <th>exp2</th> <td>45</td> <td>51</td> <td>52</td> <td>49.333333</td> <td>3.091206</td> </tr> <tr> <th>exp3</th> <td>29</td> <td>31</td> <td>35</td> <td>31.666667</td> <td>2.494438</td> </tr>
</tbody> </table>
mean_of_means = data.row_mean.mean()
std_of_means = data.row_mean.std()
confidence = 0.95
print("mean(means): {}\nstd(means):{}".format(mean_of_means,std_of_means))
- 均值(平均值):39.66666666666667
- 标准(平均值):8.950481054731702
第一次不正确 尝试(zscore):
zscore = st.norm.ppf(1-(1-confidence)/2)
lower_bound = mean_of_means - (zscore*std_of_means)
upper_bound = mean_of_means + (zscore*std_of_means)
print("95% CI = [{},{}]".format(lower_bound,upper_bound))
- 95% CI = [22.1,57.2] (不正确解)
第二次不正确尝试(tscore):
tscore = st.t.ppf(1-0.05, data.shape[0])
lower_bound = mean_of_means - (tscore*std_of_means)
upper_bound = mean_of_means + (tscore*std_of_means)
print("95% CI = [{},{}]".format(lower_bound,upper_bound))
- 95% CI = [18.60,60.73](不正确解决方案)
第 3 次不正确 尝试(助推器):
CIs = bootstraps.ci(data=data.row_mean, statfunction=scipy.mean,alpha=0.05)
- 95% CI = [31.67, 49.33] (不正确解)
如何使用pandas/python实现this solution以获得下面的正确解决方案?
- 95% CI = [17.4 至 61.9](正确 解)
谢谢乔恩·贝茨。
import pandas as pd
import scipy
import scipy.stats as st
data = pd.DataFrame({
"exp1":[34, 41, 39]
,"exp2":[45, 51, 52]
,"exp3":[29, 31, 35]
}).T
data.loc[:,"row_mean"] = data.mean(axis=1)
data.loc[:,"row_std"] = data.std(axis=1)
tscore = st.t.ppf(1-0.025, data.shape[0]-1)
print("mean(means): {}\nstd(means): {}\ntscore: {}".format(mean_of_means,std_of_means,tscore))
lower_bound = mean_of_means - (tscore*std_of_means/(data.shape[0]**0.5))
upper_bound = mean_of_means + (tscore*std_of_means/(data.shape[0]**0.5))
print("95% CI = [{},{}]".format(lower_bound,upper_bound))
均值(平均值):39.66666666666667
标准(平均值):8.950481054731702
tscore: 4.302652729911275
95% CI = [17.432439139464606,61.90089419386874]
怎么可能this solution be implemented using pandas/python? This question concerns the implementation of finding a 95% CI around a mean of means using this stats.stackexchange solution。
import pandas as pd
from IPython.display import display
import scipy
import scipy.stats as st
import scikits.bootstrap as bootstraps
data = pd.DataFrame({
"exp1":[34, 41, 39]
,"exp2":[45, 51, 52]
,"exp3":[29, 31, 35]
}).T
data.loc[:,"row_mean"] = data.mean(axis=1)
data.loc[:,"row_std"] = data.std(axis=1)
display(data)
<table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>0</th> <th>1</th> <th>2</th> <th>row_mean</th> <th>row_std</th> </tr> </thead> <tbody> <tr> <th>exp1</th> <td>34</td> <td>41</td> <td>39</td> <td>38.000000</td> <td>2.943920</td> </tr> <tr> <th>exp2</th> <td>45</td> <td>51</td> <td>52</td> <td>49.333333</td> <td>3.091206</td> </tr> <tr> <th>exp3</th> <td>29</td> <td>31</td> <td>35</td> <td>31.666667</td> <td>2.494438</td> </tr>
</tbody> </table>
mean_of_means = data.row_mean.mean()
std_of_means = data.row_mean.std()
confidence = 0.95
print("mean(means): {}\nstd(means):{}".format(mean_of_means,std_of_means))
- 均值(平均值):39.66666666666667
- 标准(平均值):8.950481054731702
第一次不正确 尝试(zscore):
zscore = st.norm.ppf(1-(1-confidence)/2)
lower_bound = mean_of_means - (zscore*std_of_means)
upper_bound = mean_of_means + (zscore*std_of_means)
print("95% CI = [{},{}]".format(lower_bound,upper_bound))
- 95% CI = [22.1,57.2] (不正确解)
第二次不正确尝试(tscore):
tscore = st.t.ppf(1-0.05, data.shape[0])
lower_bound = mean_of_means - (tscore*std_of_means)
upper_bound = mean_of_means + (tscore*std_of_means)
print("95% CI = [{},{}]".format(lower_bound,upper_bound))
- 95% CI = [18.60,60.73](不正确解决方案)
第 3 次不正确 尝试(助推器):
CIs = bootstraps.ci(data=data.row_mean, statfunction=scipy.mean,alpha=0.05)
- 95% CI = [31.67, 49.33] (不正确解)
如何使用pandas/python实现this solution以获得下面的正确解决方案?
- 95% CI = [17.4 至 61.9](正确 解)
谢谢乔恩·贝茨。
import pandas as pd
import scipy
import scipy.stats as st
data = pd.DataFrame({
"exp1":[34, 41, 39]
,"exp2":[45, 51, 52]
,"exp3":[29, 31, 35]
}).T
data.loc[:,"row_mean"] = data.mean(axis=1)
data.loc[:,"row_std"] = data.std(axis=1)
tscore = st.t.ppf(1-0.025, data.shape[0]-1)
print("mean(means): {}\nstd(means): {}\ntscore: {}".format(mean_of_means,std_of_means,tscore))
lower_bound = mean_of_means - (tscore*std_of_means/(data.shape[0]**0.5))
upper_bound = mean_of_means + (tscore*std_of_means/(data.shape[0]**0.5))
print("95% CI = [{},{}]".format(lower_bound,upper_bound))
均值(平均值):39.66666666666667
标准(平均值):8.950481054731702
tscore: 4.302652729911275
95% CI = [17.432439139464606,61.90089419386874]