Pandas - 如何对每组数据帧中的值与时间进行 OLS 回归?
Pandas - How to perform OLS Regression of values versus time in each group of a dataframe?
我在数据框中有以下形式的每小时读数:
Date_Time Temp
2001-01-01 00:00:00 -1.3
2001-01-01 01:00:00 -2.1
2001-01-01 02:00:00 -1.9
2001-01-01 03:00:00 -2.2
2001-01-01 04:00:00 -2.8
2001-01-01 05:00:00 -2.0
2001-01-01 06:00:00 -2.2
我想按 N 小时(即 3)对读数进行分组,并确定每组的温度与时间的 OLS 斜率。
我知道如何对数据框进行分组:
df_g = df_g.assign(tgp = df['Temp'].groupby(pds.Grouper(freq='3h')) )
但在那之后我卡住了,我不知道从哪里开始。有人可以帮助我实现我的目标吗?
简单(单变量)OLS 回归的 beta 就是 cov(x, y)/var(x)
考虑到这一点:
# Generate Test data
df = pd.DataFrame(np.random.rand(50),
index=pd.date_range(start='2018 1 1', periods=50, freq='15T'),
columns=['Temp'])
# Copy index as a part of data set
df['DateTime'] = df.index
# Choose starting point as reference date (It doesnt matter what date it is)
# I'm just looking to convert the dates to numbers
rederence_dt = df['DateTime'].iloc[0]
df['DateTime'] = (rederence_dt - df['DateTime']).dt.seconds
var = df.groupby(pd.Grouper(freq='3h')).var()['DateTime']
cov = df.groupby(pd.Grouper(freq='3h')).corr().loc(axis=0)[:, 'Temp']['DateTime'].reset_index(level=1, drop=True)
beta = cov/var
我在数据框中有以下形式的每小时读数:
Date_Time Temp
2001-01-01 00:00:00 -1.3
2001-01-01 01:00:00 -2.1
2001-01-01 02:00:00 -1.9
2001-01-01 03:00:00 -2.2
2001-01-01 04:00:00 -2.8
2001-01-01 05:00:00 -2.0
2001-01-01 06:00:00 -2.2
我想按 N 小时(即 3)对读数进行分组,并确定每组的温度与时间的 OLS 斜率。
我知道如何对数据框进行分组:
df_g = df_g.assign(tgp = df['Temp'].groupby(pds.Grouper(freq='3h')) )
但在那之后我卡住了,我不知道从哪里开始。有人可以帮助我实现我的目标吗?
简单(单变量)OLS 回归的 beta 就是 cov(x, y)/var(x)
考虑到这一点:
# Generate Test data
df = pd.DataFrame(np.random.rand(50),
index=pd.date_range(start='2018 1 1', periods=50, freq='15T'),
columns=['Temp'])
# Copy index as a part of data set
df['DateTime'] = df.index
# Choose starting point as reference date (It doesnt matter what date it is)
# I'm just looking to convert the dates to numbers
rederence_dt = df['DateTime'].iloc[0]
df['DateTime'] = (rederence_dt - df['DateTime']).dt.seconds
var = df.groupby(pd.Grouper(freq='3h')).var()['DateTime']
cov = df.groupby(pd.Grouper(freq='3h')).corr().loc(axis=0)[:, 'Temp']['DateTime'].reset_index(level=1, drop=True)
beta = cov/var