如何添加带有预测的新列?

How can I add a new column with forecasts?

我正在尝试使用 ARIMA 模型进行预测。我的问题是,如何使用未来的新日期(基于未来的步骤)创建一个包含我的预测值的新列。这是我的代码:

import numpy as np
import pandas as pd
from pandas import datetime
import matplotlib.pylab as plt
%matplotlib inline
df = pd.read_csv("Desktop/Daten/probe.csv",sep=";")
df["Monthes"] = pd.to_datetime(dataset["Monthes"], infer_datetime_format=True)
indexedDf = df.set_index(["Monthes"])
from statsmodels.tsa.arima_model import ARIMA

model = ARIMA(indexedDf, order =(1,1,2))
results_ARIMA = model.fit(disp=0)
n = 120 # 1 year Forecasting
result = results_ARIMA.forecast(steps=n)[0]

如何将预测结果放入带有新 'n' 月份的新选项卡中? ..

假设您要将此列添加到您的数据框 (df),这就是您需要执行的操作。

df['result`] = result

如果您想将此结果写入 excel 传播 sheet 并将 sheet 重命名为结果日期,

N = [30, 60, 90, 120]
with pd.ExcelWriter('output.xlsx') as writer:
    # if you want to write multiple forecasts to 
    # the same file, but in different spreadsheets
    for n in N: 
        result = results_ARIMA.forecast(steps=n)[0]
        df['result'] = result
        df.to_excel(writer, sheet_name='Sheet_n={}'.format(n))

如果你想用明天的日期(2019-11-22)命名sheet,那么只需更改sheet_name='2019-11-22'

如何获取明天的日期?

import datetime
def tomorrow():
    return datetime.date.today() + datetime.timedelta(days=1)
print(tomorrow())

日期到字符串的转换:

dates.apply(lambda x: x.strftime('%Y-%m-%d'))

我建议您查看 the documentation 以更清楚地了解 pandas.ExcelWriter

你可以这样做:

假设您的数据框如下所示:

         date  spend
0  2019-11-10    800
1  2019-11-11    800
2  2019-11-12    300
3  2019-11-13    150
4  2019-11-14    300
5  2019-11-15    500
6  2019-11-16    800
7  2019-11-17    600
8  2019-11-18    400
n = 5
t = pd.date_range(start=(df.date[len(df)-1]) , periods=n)
# assume predictions
predictions = np.random.rand(5) * 1000
# array([619.34810384, 600.78387725, 242.4680893 , 920.58391429, 489.36016082])
new_df = pd.DataFrame([[x, y] for x,y in zip(t, predictions)], columns=["date", "spend"])
print(new_df)
        date      spend
0 2019-11-19  94.944353
1 2019-11-20  64.813264
2 2019-11-21  56.319640
3 2019-11-22  81.696114
4 2019-11-23  43.533978

现在你终于可以concat/append它到你的数据框了:

df = pd.concat([df, new_df]).reset_index(drop=True)

输出

         date  spend
0  2019-11-10    800
1  2019-11-11    800
2  2019-11-12    300
3  2019-11-13    150
4  2019-11-14    300
5  2019-11-15    500
6  2019-11-16    800
7  2019-11-17    600
8  2019-11-18    400
9  2019-11-19    94.944353
10 2019-11-20    64.813264
11 2019-11-21    56.319640
12 2019-11-22    81.696114
13 2019-11-23    43.533978