如何 运行 OLS 回归 pandas 日期时间对象序列是独立值 (x)

how to run OLS regression with pandas datetime object series being independent value (x)

我想,我缺少一些基本的东西。但是,这是我的问题。我有 25 天的时间序列数据,我想 运行 OLS 回归,Y 值是时间序列,X 值是日期时间索引。这是我的代码

import pandas as pd
from pandas.stats.api import ols

indx = [Timestamp('2015-06-01 00:00:00'), Timestamp('2015-06-02 00:00:00'), Timestamp('2015-06-03 00:00:00'), Timestamp('2015-06-04 00:00:00'), Timestamp('2015-06-05 00:00:00'), Timestamp('2015-06-06 00:00:00'), Timestamp('2015-06-07 00:00:00'), Timestamp('2015-06-08 00:00:00'), Timestamp('2015-06-09 00:00:00'), Timestamp('2015-06-10 00:00:00'), Timestamp('2015-06-11 00:00:00'), Timestamp('2015-06-12 00:00:00'), Timestamp('2015-06-13 00:00:00'), Timestamp('2015-06-14 00:00:00'), Timestamp('2015-06-15 00:00:00'), Timestamp('2015-06-16 00:00:00'), Timestamp('2015-06-17 00:00:00'), Timestamp('2015-06-18 00:00:00'), Timestamp('2015-06-19 00:00:00'), Timestamp('2015-06-20 00:00:00'), Timestamp('2015-06-21 00:00:00'), Timestamp('2015-06-22 00:00:00'), Timestamp('2015-06-23 00:00:00'), Timestamp('2015-06-24 00:00:00'), Timestamp('2015-06-25 00:00:00')]
col = [51.219999999999999, 51.189999999999998, 51.210000000000001, 51.229999999999997, 51.219999999999999, 51.219999999999999, 51.219999999999999, 51.229999999999997, 51.240000000000002, 51.219999999999999, 51.200000000000003, 51.200000000000003, 51.200000000000003, 51.219999999999999, 51.219999999999999, 51.219999999999999, 51.219999999999999, 51.270000000000003, 51.280000000000001, 51.280000000000001, 51.299999999999997, 51.299999999999997, 51.280000000000001, 51.280000000000001, 51.270000000000003]
df = pd.DataFrame(col,index=indx,columns=['abc'])
sumstat = ols(y=df['abc'],x=df.index)

但是,我收到以下错误,因为索引是日期时间对象

Exception: Invalid RHS type: <class 'pandas.tseries.index.DatetimeIndex'>

将此添加到您的代码中:

df['jDate'] = df.index.to_julian_date()
ols(x=df.jDate, y=df.abc)

完整的样子:

import pandas as pd
from pandas.stats.api import ols

indx = [Timestamp('2015-06-01 00:00:00'), Timestamp('2015-06-02 00:00:00'), Timestamp('2015-06-03 00:00:00'), Timestamp('2015-06-04 00:00:00'), Timestamp('2015-06-05 00:00:00'), Timestamp('2015-06-06 00:00:00'), Timestamp('2015-06-07 00:00:00'), Timestamp('2015-06-08 00:00:00'), Timestamp('2015-06-09 00:00:00'), Timestamp('2015-06-10 00:00:00'), Timestamp('2015-06-11 00:00:00'), Timestamp('2015-06-12 00:00:00'), Timestamp('2015-06-13 00:00:00'), Timestamp('2015-06-14 00:00:00'), Timestamp('2015-06-15 00:00:00'), Timestamp('2015-06-16 00:00:00'), Timestamp('2015-06-17 00:00:00'), Timestamp('2015-06-18 00:00:00'), Timestamp('2015-06-19 00:00:00'), Timestamp('2015-06-20 00:00:00'), Timestamp('2015-06-21 00:00:00'), Timestamp('2015-06-22 00:00:00'), Timestamp('2015-06-23 00:00:00'), Timestamp('2015-06-24 00:00:00'), Timestamp('2015-06-25 00:00:00')]
col = [51.219999999999999, 51.189999999999998, 51.210000000000001, 51.229999999999997, 51.219999999999999, 51.219999999999999, 51.219999999999999, 51.229999999999997, 51.240000000000002, 51.219999999999999, 51.200000000000003, 51.200000000000003, 51.200000000000003, 51.219999999999999, 51.219999999999999, 51.219999999999999, 51.219999999999999, 51.270000000000003, 51.280000000000001, 51.280000000000001, 51.299999999999997, 51.299999999999997, 51.280000000000001, 51.280000000000001, 51.270000000000003]
df = pd.DataFrame(col,index=indx,columns=['abc'])
df['jDate'] = df.index.to_julian_date()
sumstat = ols(x=df.jDate, y=df.abc)