rolling.apply 关于需要多列数据框以减少单列的自定义函数
rolling.apply on custom function that requires multiple columns of dataframe to reduce single column
我正在尝试使用自定义函数在 df['cond']
上通过 rolling.apply 创建一个附加列 df['newc']
。自定义函数需要两列df
。我不确定如何让它工作。
我试过了
df['newc'] = df['cond'].rolling(4).apply(T_correction,
args = (df['temp'].rolling(4)))
这显然不起作用,并出现以下错误:
raise NotImplementedError('See issue #11704 {url}'.format(url=url))
NotImplementedError: See issue #11704 https://github.com/pandas-dev/pandas/issues/11704
可能rolling.apply在这里不合适。寻找有关替代解决方案的建议。
>>> df.head()
temp cond
ts
2018-06-01 00:00:00 51.908 27.83
2018-06-01 00:05:00 52.144 27.83
2018-06-01 00:10:00 51.880 27.83
2018-06-01 00:15:00 52.001 27.83
2018-06-01 00:20:00 51.835 27.83
def T_correction(df, d):
df = pd.DataFrame(data = df)
df.columns = ['cond']
df['temp'] = d
X = df.drop(['cond'], axis = 1) # X features: temp
X = sm.add_constant(X) # add intercept
lmodel = sm.OLS(df.cond, X) # fit cond = a + b*temp
results = lmodel.fit() #
Op = results.predict(X) # derive 'cond' as explained by temp
Tc1 = df.cond - Op # remove the linear influence
#---conditional correction --------------------------------------
Tc = np.where(df.temp > (np.mean(df.temp) + 0.5*np.std(df.temp)), df.cond, Tc1)
return Tc[-1] # returning the last value
预期结果:
>>> df.head()
temp cond newc
ts
2018-06-01 00:00:00 51.908 27.83 NaN
2018-06-01 00:05:00 52.144 27.83 NaN
2018-06-01 00:10:00 51.880 27.83 NaN
2018-06-01 00:15:00 52.001 27.83 26.00
2018-06-01 00:20:00 51.835 27.83 25.00
该功能目前似乎不可用。 pandas github 上有一个关于此主题的未解决问题。请检查:https://github.com/pandas-dev/pandas/issues/15095
.
我正在尝试使用自定义函数在 df['cond']
上通过 rolling.apply 创建一个附加列 df['newc']
。自定义函数需要两列df
。我不确定如何让它工作。
我试过了
df['newc'] = df['cond'].rolling(4).apply(T_correction,
args = (df['temp'].rolling(4)))
这显然不起作用,并出现以下错误:
raise NotImplementedError('See issue #11704 {url}'.format(url=url))
NotImplementedError: See issue #11704 https://github.com/pandas-dev/pandas/issues/11704
可能rolling.apply在这里不合适。寻找有关替代解决方案的建议。
>>> df.head()
temp cond
ts
2018-06-01 00:00:00 51.908 27.83
2018-06-01 00:05:00 52.144 27.83
2018-06-01 00:10:00 51.880 27.83
2018-06-01 00:15:00 52.001 27.83
2018-06-01 00:20:00 51.835 27.83
def T_correction(df, d):
df = pd.DataFrame(data = df)
df.columns = ['cond']
df['temp'] = d
X = df.drop(['cond'], axis = 1) # X features: temp
X = sm.add_constant(X) # add intercept
lmodel = sm.OLS(df.cond, X) # fit cond = a + b*temp
results = lmodel.fit() #
Op = results.predict(X) # derive 'cond' as explained by temp
Tc1 = df.cond - Op # remove the linear influence
#---conditional correction --------------------------------------
Tc = np.where(df.temp > (np.mean(df.temp) + 0.5*np.std(df.temp)), df.cond, Tc1)
return Tc[-1] # returning the last value
预期结果:
>>> df.head()
temp cond newc
ts
2018-06-01 00:00:00 51.908 27.83 NaN
2018-06-01 00:05:00 52.144 27.83 NaN
2018-06-01 00:10:00 51.880 27.83 NaN
2018-06-01 00:15:00 52.001 27.83 26.00
2018-06-01 00:20:00 51.835 27.83 25.00
该功能目前似乎不可用。 pandas github 上有一个关于此主题的未解决问题。请检查:https://github.com/pandas-dev/pandas/issues/15095
.