如何添加带有预测的新列?
How can I add a new column with forecasts?
我正在尝试使用 ARIMA 模型进行预测。我的问题是,如何使用未来的新日期(基于未来的步骤)创建一个包含我的预测值的新列。这是我的代码:
import numpy as np
import pandas as pd
from pandas import datetime
import matplotlib.pylab as plt
%matplotlib inline
df = pd.read_csv("Desktop/Daten/probe.csv",sep=";")
df["Monthes"] = pd.to_datetime(dataset["Monthes"], infer_datetime_format=True)
indexedDf = df.set_index(["Monthes"])
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(indexedDf, order =(1,1,2))
results_ARIMA = model.fit(disp=0)
n = 120 # 1 year Forecasting
result = results_ARIMA.forecast(steps=n)[0]
如何将预测结果放入带有新 'n' 月份的新选项卡中? ..
假设您要将此列添加到您的数据框 (df
),这就是您需要执行的操作。
df['result`] = result
如果您想将此结果写入 excel 传播 sheet 并将 sheet 重命名为结果日期,
N = [30, 60, 90, 120]
with pd.ExcelWriter('output.xlsx') as writer:
# if you want to write multiple forecasts to
# the same file, but in different spreadsheets
for n in N:
result = results_ARIMA.forecast(steps=n)[0]
df['result'] = result
df.to_excel(writer, sheet_name='Sheet_n={}'.format(n))
如果你想用明天的日期(2019-11-22)命名sheet,那么只需更改sheet_name='2019-11-22'
。
如何获取明天的日期?
import datetime
def tomorrow():
return datetime.date.today() + datetime.timedelta(days=1)
print(tomorrow())
日期到字符串的转换:
dates.apply(lambda x: x.strftime('%Y-%m-%d'))
我建议您查看 the documentation 以更清楚地了解 pandas.ExcelWriter
。
你可以这样做:
假设您的数据框如下所示:
date spend
0 2019-11-10 800
1 2019-11-11 800
2 2019-11-12 300
3 2019-11-13 150
4 2019-11-14 300
5 2019-11-15 500
6 2019-11-16 800
7 2019-11-17 600
8 2019-11-18 400
n = 5
t = pd.date_range(start=(df.date[len(df)-1]) , periods=n)
# assume predictions
predictions = np.random.rand(5) * 1000
# array([619.34810384, 600.78387725, 242.4680893 , 920.58391429, 489.36016082])
new_df = pd.DataFrame([[x, y] for x,y in zip(t, predictions)], columns=["date", "spend"])
print(new_df)
date spend
0 2019-11-19 94.944353
1 2019-11-20 64.813264
2 2019-11-21 56.319640
3 2019-11-22 81.696114
4 2019-11-23 43.533978
现在你终于可以concat/append它到你的数据框了:
df = pd.concat([df, new_df]).reset_index(drop=True)
输出
date spend
0 2019-11-10 800
1 2019-11-11 800
2 2019-11-12 300
3 2019-11-13 150
4 2019-11-14 300
5 2019-11-15 500
6 2019-11-16 800
7 2019-11-17 600
8 2019-11-18 400
9 2019-11-19 94.944353
10 2019-11-20 64.813264
11 2019-11-21 56.319640
12 2019-11-22 81.696114
13 2019-11-23 43.533978
我正在尝试使用 ARIMA 模型进行预测。我的问题是,如何使用未来的新日期(基于未来的步骤)创建一个包含我的预测值的新列。这是我的代码:
import numpy as np
import pandas as pd
from pandas import datetime
import matplotlib.pylab as plt
%matplotlib inline
df = pd.read_csv("Desktop/Daten/probe.csv",sep=";")
df["Monthes"] = pd.to_datetime(dataset["Monthes"], infer_datetime_format=True)
indexedDf = df.set_index(["Monthes"])
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(indexedDf, order =(1,1,2))
results_ARIMA = model.fit(disp=0)
n = 120 # 1 year Forecasting
result = results_ARIMA.forecast(steps=n)[0]
如何将预测结果放入带有新 'n' 月份的新选项卡中? ..
假设您要将此列添加到您的数据框 (df
),这就是您需要执行的操作。
df['result`] = result
如果您想将此结果写入 excel 传播 sheet 并将 sheet 重命名为结果日期,
N = [30, 60, 90, 120]
with pd.ExcelWriter('output.xlsx') as writer:
# if you want to write multiple forecasts to
# the same file, but in different spreadsheets
for n in N:
result = results_ARIMA.forecast(steps=n)[0]
df['result'] = result
df.to_excel(writer, sheet_name='Sheet_n={}'.format(n))
如果你想用明天的日期(2019-11-22)命名sheet,那么只需更改sheet_name='2019-11-22'
。
如何获取明天的日期?
import datetime
def tomorrow():
return datetime.date.today() + datetime.timedelta(days=1)
print(tomorrow())
日期到字符串的转换:
dates.apply(lambda x: x.strftime('%Y-%m-%d'))
我建议您查看 the documentation 以更清楚地了解 pandas.ExcelWriter
。
你可以这样做:
假设您的数据框如下所示:
date spend
0 2019-11-10 800
1 2019-11-11 800
2 2019-11-12 300
3 2019-11-13 150
4 2019-11-14 300
5 2019-11-15 500
6 2019-11-16 800
7 2019-11-17 600
8 2019-11-18 400
n = 5
t = pd.date_range(start=(df.date[len(df)-1]) , periods=n)
# assume predictions
predictions = np.random.rand(5) * 1000
# array([619.34810384, 600.78387725, 242.4680893 , 920.58391429, 489.36016082])
new_df = pd.DataFrame([[x, y] for x,y in zip(t, predictions)], columns=["date", "spend"])
print(new_df)
date spend
0 2019-11-19 94.944353
1 2019-11-20 64.813264
2 2019-11-21 56.319640
3 2019-11-22 81.696114
4 2019-11-23 43.533978
现在你终于可以concat/append它到你的数据框了:
df = pd.concat([df, new_df]).reset_index(drop=True)
输出
date spend
0 2019-11-10 800
1 2019-11-11 800
2 2019-11-12 300
3 2019-11-13 150
4 2019-11-14 300
5 2019-11-15 500
6 2019-11-16 800
7 2019-11-17 600
8 2019-11-18 400
9 2019-11-19 94.944353
10 2019-11-20 64.813264
11 2019-11-21 56.319640
12 2019-11-22 81.696114
13 2019-11-23 43.533978