使用滚动 Window 应用 Scipy 的 ttest - 第 2 部分
Applying Scipy's ttest Using a Rolling Window - Part 2
我的问题与 this 问题的角度略有不同:
我有一个辅助函数可以计算 scipy 的独立性 ttest。这是:
# Helper Function for Testing for Independence
def conduct_ttest(data, variable_1="bias", variable_2="score", nan_policy="omit"):
test_result = ttest_ind(data[variable_1], data[variable_2], nan_policy=nan_policy)
test_statistic = test_result[0]
p_value = test_result[1]
return test_statistic, p_value
我想 运行 它使用 5 个周期滚动 window 以便它将测试结果输出到数据框“数据”中。数据框如下所示:
date bias score
1/1/2021 5 1000
1/2/2021 13 1089
1/3/2021 21 1178
1/4/2021 29 1267
1/5/2021 37 1356
1/6/2021 45 1445
1/7/2021 53 1534
1/8/2021 61 1623
1/9/2021 69 1712
1/10/2021 77 1801
1/11/2021 85 1890
1/12/2021 93 1979
1/13/2021 101 2068
1/14/2021 109 2157
1/15/2021 117 2246
1/16/2021 125 2335
1/17/2021 133 2424
我尝试过的:
data[["test_statistic", "p_value"]] = \
data.rolling(5).apply(lambda x: conduct_ttest(x, variable_1="bias", variable_2="score", nan_policy="omit")
但是,它不起作用。有人对我能做什么有什么建议吗?
我未能找到内置 rolling
方法,因此请尝试这个简单的迭代解决方案:
#in this function I just added index to returning values:
def conduct_ttest(data, variable_1="bias", variable_2="score", nan_policy="omit"):
test_result = ttest_ind(data[variable_1], data[variable_2], nan_policy=nan_policy)
test_statistic = test_result[0]
p_value = test_result[1]
return data.index.max(), test_statistic, p_value
#define rolling apply period:
window = 5
pd.DataFrame(
[conduct_ttest(df.iloc[range(i,i+window)]) for i in range(len(df)-window)],
columns=['index','test_statistic','p_value']
).set_index('index', drop=True)
结果:
test_statistic p_value
index
4 -18.310951 8.140624e-08
5 -19.592876 4.788281e-08
6 -20.874800 2.909324e-08
7 -22.156725 1.819271e-08
8 -23.438650 1.167216e-08
9 -24.720575 7.663247e-09
10 -26.002500 5.136947e-09
11 -27.284425 3.509024e-09
12 -28.566349 2.438519e-09
13 -29.848274 1.721420e-09
14 -31.130199 1.232845e-09
15 -32.412124 8.947394e-10
我的问题与 this 问题的角度略有不同:
我有一个辅助函数可以计算 scipy 的独立性 ttest。这是:
# Helper Function for Testing for Independence
def conduct_ttest(data, variable_1="bias", variable_2="score", nan_policy="omit"):
test_result = ttest_ind(data[variable_1], data[variable_2], nan_policy=nan_policy)
test_statistic = test_result[0]
p_value = test_result[1]
return test_statistic, p_value
我想 运行 它使用 5 个周期滚动 window 以便它将测试结果输出到数据框“数据”中。数据框如下所示:
date bias score
1/1/2021 5 1000
1/2/2021 13 1089
1/3/2021 21 1178
1/4/2021 29 1267
1/5/2021 37 1356
1/6/2021 45 1445
1/7/2021 53 1534
1/8/2021 61 1623
1/9/2021 69 1712
1/10/2021 77 1801
1/11/2021 85 1890
1/12/2021 93 1979
1/13/2021 101 2068
1/14/2021 109 2157
1/15/2021 117 2246
1/16/2021 125 2335
1/17/2021 133 2424
我尝试过的:
data[["test_statistic", "p_value"]] = \
data.rolling(5).apply(lambda x: conduct_ttest(x, variable_1="bias", variable_2="score", nan_policy="omit")
但是,它不起作用。有人对我能做什么有什么建议吗?
我未能找到内置 rolling
方法,因此请尝试这个简单的迭代解决方案:
#in this function I just added index to returning values:
def conduct_ttest(data, variable_1="bias", variable_2="score", nan_policy="omit"):
test_result = ttest_ind(data[variable_1], data[variable_2], nan_policy=nan_policy)
test_statistic = test_result[0]
p_value = test_result[1]
return data.index.max(), test_statistic, p_value
#define rolling apply period:
window = 5
pd.DataFrame(
[conduct_ttest(df.iloc[range(i,i+window)]) for i in range(len(df)-window)],
columns=['index','test_statistic','p_value']
).set_index('index', drop=True)
结果:
test_statistic p_value
index
4 -18.310951 8.140624e-08
5 -19.592876 4.788281e-08
6 -20.874800 2.909324e-08
7 -22.156725 1.819271e-08
8 -23.438650 1.167216e-08
9 -24.720575 7.663247e-09
10 -26.002500 5.136947e-09
11 -27.284425 3.509024e-09
12 -28.566349 2.438519e-09
13 -29.848274 1.721420e-09
14 -31.130199 1.232845e-09
15 -32.412124 8.947394e-10