如何使用fbProphet或其他模型执行包含Python中多个组的时间序列分析？

Question

全部，

我的数据集如下所示。我正在尝试使用 fbProphet 或其他模型预测接下来 6 个月的 'amount'。但我的问题是我想根据每个组预测数量，即接下来 6 个月的 A、B、C、D。我不确定如何使用 fbProphet 或其他模型在 python 中做到这一点？我引用了 official page of fbprophet，但我发现的唯一信息是 "Prophet" 仅包含两列，一个是 "Date"，另一个是 "amount"。

我是 python 的新手，所以非常感谢任何有关代码解释的帮助！

import pandas as pd
data = {'Date':['2017-01-01', '2017-02-01', '2017-03-01', '2017-04-01','2017-05-01','2017-06-01','2017-07-01'],'Group':['A','B','C','D','C','A','B'],
       'Amount':['12.1','13','15','10','12','9.0','5.6']}
df = pd.DataFrame(data)
print (df)

输出：

         Date Group Amount
0  2017-01-01     A   12.1
1  2017-02-01     B     13
2  2017-03-01     C     15
3  2017-04-01     D     10
4  2017-05-01     C     12
5  2017-06-01     A    9.0
6  2017-07-01     B    5.6

Answer 1

fbprophet需要两列ds和y，所以需要先重命名这两列

df = df.rename(columns={'Date': 'ds', 'Amount':'y'})

假设您的组彼此独立，并且您希望对每个组进行一个预测，您可以按 "Group" 列对数据框进行分组，并为每个组进行运行预测

from fbprophet import Prophet
grouped = df.groupby('Group')
for g in grouped.groups:
    group = grouped.get_group(g)
    m = Prophet()
    m.fit(group)
    future = m.make_future_dataframe(periods=365)
    forecast = m.predict(future)
    print(forecast.tail())

请注意，您在问题中提供的输入数据帧对于模型来说是不够的，因为 D 组只有一个数据点。 fbprophet 的预测至少需要 2 个非 Nan 行。

编辑：如果你想将所有预测合并到一个数据帧中，想法是为每个观察以不同的方式命名 yhat，在循环中执行 pd.merge()，然后挑选最后需要的列：

final = pd.DataFrame()
for g in grouped.groups:
    group = grouped.get_group(g)
    m = Prophet()
    m.fit(group)
    future = m.make_future_dataframe(periods=365)
    forecast = m.predict(future)    
    forecast = forecast.rename(columns={'yhat': 'yhat_'+g})
    final = pd.merge(final, forecast.set_index('ds'), how='outer', left_index=True, right_index=True)

final = final[['yhat_' + g for g in grouped.groups.keys()]]

Answer 2

import pandas as pd
import numpy as np
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.stattools import adfuller
from matplotlib import pyplot as plt
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_squared_log_error  



# Before doing any modeling using ARIMA or SARIMAS etc Confirm that
# your time-series is stationary by using Augmented Dick Fuller test
# or other tests.

# Create a list of all groups or get from Data using np.unique or other methods
groups_iter = ['A', 'B', 'C', 'D']

dict_org = {}
dict_pred = {}
group_accuracy = {}

# Iterate over all groups and get data 
# from Dataframe by filtering for specific group
for i in range(len(groups_iter)):
    X = data[data['Group'] == groups_iter[i]]['Amount'].values
    size = int(len(X) * 0.70)
    train, test = X[0:size], X[size:len(X)]
    history = [x for in train]

    # Using ARIMA model here you can also do grid search for best parameters
    for t in range(len(test)):
        model = ARIMA(history, order = (5, 1, 0))
        model_fit = model.fit(disp = 0)
        output = model_fit.forecast()
        yhat = output[0]
        predictions.append(yhat)
        obs = test[t]
        history.append(obs)
        print("Predicted:%f, expected:%f" %(yhat, obs))
    error = mean_squared_log_error(test, predictions)
    dict_org.update({groups_iter[i]: test})
    dict_pred.update({group_iter[i]: test})

    print("Group: ", group_iter[i], "Test MSE:%f"% error)
    group_accuracy.update({group_iter[i]: error})
    plt.plot(test)
    plt.plot(predictions, color = 'red')
    plt.show()

Answer 3

我知道这是旧的，但我试图预测不同客户的结果，我尝试使用上面的 Aditya Santoso 解决方案但遇到了一些错误，所以我添加了一些修改，最后这对我有用：

df = pd.read_csv('file.csv')
df = pd.DataFrame(df)
df = df.rename(columns={'date': 'ds', 'amount': 'y', 'client_id': 'client_id'})
#I had to filter first clients with less than 3 records to avoid errors as prophet only works for 2+ records by group
df = df.groupby('client_id').filter(lambda x: len(x) > 2)

df.client_id = df.client_id.astype(str)

final = pd.DataFrame(columns=['client','ds','yhat'])

grouped = df.groupby('client_id')
for g in grouped.groups:
    group = grouped.get_group(g)
    m = Prophet()
    m.fit(group)
    future = m.make_future_dataframe(periods=365)
    forecast = m.predict(future)
    #I added a column with client id
    forecast['client'] = g
    #I used concat instead of merge
    final = pd.concat([final, forecast], ignore_index=True)

final.head(10)

如何使用fbProphet或其他模型执行包含Python中多个组的时间序列分析？

How to perform time series analysis that contains multiple groups in Python using fbProphet or other models?

python

time-series

pandas

facebook-prophet