使用 statsmodel 的 pandas 时间序列中单个比例的 95% 置信区间
95% confidence interval of single proportion in pandas timeseries using statsmodel
我有一个时间序列数据框:
df = pd.DataFrame({'year':['2010','2011','2012','2013','2014','2015','2016','2017','2018','2019'],
'total_count': [545,779,706,547,626,530,766,1235,1260,947],
'rand_count':[96,184,148,154,160,149,124,274,322,301],
'rand_perc':[17.61,23.62,20.96,28.15,25.56,28.11,16.19,22.19,25.56,31.78]
})
这里;
df['rand_perc'] = (df['rand_count']/df['total_count'])*100
问题:
我想计算 df['total_count']
中 df['rand_count']
的单个比例的置信区间,在 df
的每一行中并绘制 df['year']
与 df['rand_perc']
CI
作为误差线。我尝试使用 statsmodel 使用以下代码为每一行计算 CI:
import statsmodels.api as sm
df['CI'] = df[['total_count', 'rand_count']].apply(lambda row: sm.stats.proportion_confint(count =
df['rand_count'], nobs = df['total_count'], alpha = 0.05), axis = 1)
但是结果 df['CI']
看起来非常讨厌每行中所有 CI 的元组作为;
0 ([0.14416430990026746, 0.2063732756491498, 0.1...
1 ([0.14416430990026746, 0.2063732756491498, 0.1...
2 ([0.14416430990026746, 0.2063732756491498, 0.1...
3 ([0.14416430990026746, 0.2063732756491498, 0.1...
4 ([0.14416430990026746, 0.2063732756491498, 0.1...
5 ([0.14416430990026746, 0.2063732756491498, 0.1...
6 ([0.14416430990026746, 0.2063732756491498, 0.1...
7 ([0.14416430990026746, 0.2063732756491498, 0.1...
8 ([0.14416430990026746, 0.2063732756491498, 0.1...
9 ([0.14416430990026746, 0.2063732756491498, 0.1...
Name: CI, dtype: object
想要的结果
df['CI']
每行两个元素各自的元组,如:
(0.144164, 0.206373)
(0.179606, 0.243846)
(0.221421, 0.242859)
...................
还有两个单独的列 df[upper]
和 df[lower]
分别表示 df['CI']
的上限和下限。
非常感谢您的帮助。
非常感谢!
考虑分配多个列,这些列应按索引排列,因为根据 docs:
When a pandas object is returned, then the index is taken from the count.
df['lower_CI'], df['upper_CI'] = sm.stats.proportion_confint(
count = df['rand_count'],
nobs = df['total_count'],
alpha = 0.05
)
我有一个时间序列数据框:
df = pd.DataFrame({'year':['2010','2011','2012','2013','2014','2015','2016','2017','2018','2019'],
'total_count': [545,779,706,547,626,530,766,1235,1260,947],
'rand_count':[96,184,148,154,160,149,124,274,322,301],
'rand_perc':[17.61,23.62,20.96,28.15,25.56,28.11,16.19,22.19,25.56,31.78]
})
这里;
df['rand_perc'] = (df['rand_count']/df['total_count'])*100
问题:
我想计算 df['total_count']
中 df['rand_count']
的单个比例的置信区间,在 df
的每一行中并绘制 df['year']
与 df['rand_perc']
CI
作为误差线。我尝试使用 statsmodel 使用以下代码为每一行计算 CI:
import statsmodels.api as sm
df['CI'] = df[['total_count', 'rand_count']].apply(lambda row: sm.stats.proportion_confint(count =
df['rand_count'], nobs = df['total_count'], alpha = 0.05), axis = 1)
但是结果 df['CI']
看起来非常讨厌每行中所有 CI 的元组作为;
0 ([0.14416430990026746, 0.2063732756491498, 0.1...
1 ([0.14416430990026746, 0.2063732756491498, 0.1...
2 ([0.14416430990026746, 0.2063732756491498, 0.1...
3 ([0.14416430990026746, 0.2063732756491498, 0.1...
4 ([0.14416430990026746, 0.2063732756491498, 0.1...
5 ([0.14416430990026746, 0.2063732756491498, 0.1...
6 ([0.14416430990026746, 0.2063732756491498, 0.1...
7 ([0.14416430990026746, 0.2063732756491498, 0.1...
8 ([0.14416430990026746, 0.2063732756491498, 0.1...
9 ([0.14416430990026746, 0.2063732756491498, 0.1...
Name: CI, dtype: object
想要的结果
df['CI']
每行两个元素各自的元组,如:
(0.144164, 0.206373)
(0.179606, 0.243846)
(0.221421, 0.242859)
...................
还有两个单独的列 df[upper]
和 df[lower]
分别表示 df['CI']
的上限和下限。
非常感谢您的帮助。
非常感谢!
考虑分配多个列,这些列应按索引排列,因为根据 docs:
When a pandas object is returned, then the index is taken from the count.
df['lower_CI'], df['upper_CI'] = sm.stats.proportion_confint(
count = df['rand_count'],
nobs = df['total_count'],
alpha = 0.05
)