python pandas 数据框根据日期预测值
python pandas dataframe predict values based on date
我有一个 python pandas 数据框 df
:
Group date Value
A 01-02-2016 16
A 01-03-2016 15
A 01-04-2016 14
A 01-05-2016 17
A 01-06-2016 19
A 01-07-2016 20
B 01-02-2016 16
B 01-03-2016 13
B 01-04-2016 13
C 01-02-2016 16
C 01-03-2016 16
我想根据日期预测值。我想预测 01-08-2016 的值。
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
#I change the dates to be integers, I am not sure this is the best way
df['date'] = pd.to_datetime(df['date'])
df['date_delta'] = (df['date'] - df['date'].min()) / np.timedelta64(1,'D')
#Is this correct?
model = LinearRegression()
X = df[['date_delta']]
y = df.Value
model.fit(X, y)
model.score(X, y)
coefs = zip(model.coef_, X.columns)
print "sl = %.1f + " % model.intercept_ + \
" + ".join("%.1f %s" % coef for coef in coefs)
我不确定我是否正确处理了日期。有没有更好的办法?
我看不出你所做的有任何问题。
您可以改用 datetime.toordinal
,但这会给您相同的结果(截距在逻辑上会有所不同,但这是正常的)。
df['date_ordinal'] = df['Date'].apply(lambda x: x.toordinal())
model = LinearRegression()
X = df[['date_ordinal']]
y = df.shown
model.fit(X, y)
如果您认为可能存在 daily/weekly/monthly/seasonal 变体,则可以使用 1-of-K 编码。例如,参见 this question。
根据您的评论更新
你说你想每组得到一个方程式:
In [2]:
results = {}
for (group, df_gp) in df.groupby('Group'):
print("Dealing with group {}".format(group))
print("----------------------")
X=df_gp[['date_ordinal']]
y=df_gp.Value
model.fit(X,y)
print("Score: {:.2f}%".format(100*model.score(X,y)))
coefs = list(zip(X.columns, model.coef_))
results[group] = [('intercept', model.intercept_)] + coefs
coefs = zip(model.coef_, X.columns)
print ("sl = %.1f + " % model.intercept_ + \
" + ".join("%.1f %s" % coef for coef in coefs))
print("\n")
Out[2]:
Dealing with group A
----------------------
Score: 65.22%
sl = -735950.7 + 1.0 date_ordinal
Dealing with group B
----------------------
Score: 75.00%
sl = 1103963.0 + -1.5 date_ordinal
Dealing with group C
----------------------
Score: 100.00%
sl = 16.0 + 0.0 date_ordinal
你也有他们在一个方便的字典:
In [3]: results
Out[3]:
{'A': [('intercept', -735950.66666666663), ('date_ordinal', 1.0)],
'B': [('intercept', 1103962.9999999995),
('date_ordinal', -1.4999999999999993)],
'C': [('intercept', 16.0), ('date_ordinal', 0.0)]}
我有一个 python pandas 数据框 df
:
Group date Value
A 01-02-2016 16
A 01-03-2016 15
A 01-04-2016 14
A 01-05-2016 17
A 01-06-2016 19
A 01-07-2016 20
B 01-02-2016 16
B 01-03-2016 13
B 01-04-2016 13
C 01-02-2016 16
C 01-03-2016 16
我想根据日期预测值。我想预测 01-08-2016 的值。
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
#I change the dates to be integers, I am not sure this is the best way
df['date'] = pd.to_datetime(df['date'])
df['date_delta'] = (df['date'] - df['date'].min()) / np.timedelta64(1,'D')
#Is this correct?
model = LinearRegression()
X = df[['date_delta']]
y = df.Value
model.fit(X, y)
model.score(X, y)
coefs = zip(model.coef_, X.columns)
print "sl = %.1f + " % model.intercept_ + \
" + ".join("%.1f %s" % coef for coef in coefs)
我不确定我是否正确处理了日期。有没有更好的办法?
我看不出你所做的有任何问题。
您可以改用 datetime.toordinal
,但这会给您相同的结果(截距在逻辑上会有所不同,但这是正常的)。
df['date_ordinal'] = df['Date'].apply(lambda x: x.toordinal())
model = LinearRegression()
X = df[['date_ordinal']]
y = df.shown
model.fit(X, y)
如果您认为可能存在 daily/weekly/monthly/seasonal 变体,则可以使用 1-of-K 编码。例如,参见 this question。
根据您的评论更新
你说你想每组得到一个方程式:
In [2]:
results = {}
for (group, df_gp) in df.groupby('Group'):
print("Dealing with group {}".format(group))
print("----------------------")
X=df_gp[['date_ordinal']]
y=df_gp.Value
model.fit(X,y)
print("Score: {:.2f}%".format(100*model.score(X,y)))
coefs = list(zip(X.columns, model.coef_))
results[group] = [('intercept', model.intercept_)] + coefs
coefs = zip(model.coef_, X.columns)
print ("sl = %.1f + " % model.intercept_ + \
" + ".join("%.1f %s" % coef for coef in coefs))
print("\n")
Out[2]:
Dealing with group A
----------------------
Score: 65.22%
sl = -735950.7 + 1.0 date_ordinal
Dealing with group B
----------------------
Score: 75.00%
sl = 1103963.0 + -1.5 date_ordinal
Dealing with group C
----------------------
Score: 100.00%
sl = 16.0 + 0.0 date_ordinal
你也有他们在一个方便的字典:
In [3]: results
Out[3]:
{'A': [('intercept', -735950.66666666663), ('date_ordinal', 1.0)],
'B': [('intercept', 1103962.9999999995),
('date_ordinal', -1.4999999999999993)],
'C': [('intercept', 16.0), ('date_ordinal', 0.0)]}