Python 多种产品的 Prophet 需求预测:将所有预测保存到单个数据框中
Python Prophet Demand Forecasting for multiple products: saving all forecasts into single data frame
我有以下代码为 3 种产品(A、B 和 C)创建时间序列预测。它将所有产品的预测保存到数据框中,forecast_df。但是,我无法弄清楚如何将产品名称与产品预测一起放在行中。它为产品名称创建一个列,但随后它为除 C 之外的所有产品放置 C,它称为 NaN。我怎样才能让它正确地输入产品名称?
import pandas as pd
import numpy as np
from datetime import date, timedelta, datetime
from fbprophet import Prophet
from pmdarima.model_selection import train_test_split
class color:
BOLD = '3[1m'
END = '3[0m'
df = pd.DataFrame({"ds": ['2020-4-26','2020-5-3','2020-5-10','2020-5-17','2020-5-24','2020-5-
31','2020-6-7','2020-6-14','2020-6-21','2020-6-28'],
"A": [164,127,157,127,170,322,133,176,233,257], "B":
[306,405,267,265,306,265,325,297,310,271],
"C": [23,41,75,24,48,31,51,26,41,43]})
df['ds'] = pd.to_datetime(df['ds'])
start_date = min(df['ds'])
end_date = max(df['ds'])
print(start_date, end_date)
train_len = int(week * 0.99)
print(train_len)
forecast_df = pd.DataFrame()
for col in df.columns[1:]:
print('\n', color.BOLD + 'ITEM #', col + color.END)
dfx = df[['ds', col]]
dfx = dfx.rename({col: 'y'}, axis=1)
#Train-Test-Split
y_train, y_test = train_test_split(dfx, train_size=train_len)
#Fit Model
m = Prophet()
m.fit(y_train)
#Calculate forecast
print('\n', color.BOLD + 'CALCULATE FORECAST' + color.END)
future = m.make_future_dataframe(periods=5, freq='W')
forecast = m.predict(future)
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail())
#save forecast to dataframe
for col in df.columns[1:]:
forecast_df['Product'] = col
#forecast_df = pd.concat((forecast_df, forecast[['ds', 'yhat','yhat_upper','yhat_lower']]))
forecast_df = forecast_df.append(forecast[['ds', 'yhat','yhat_upper','yhat_lower']])
print('\n', color.BOLD + 'FORECAST DATAFRAME' + color.END)
print(forecast_df)
这是输出:
ITEM # A
CALCULATE FORECAST
ds yhat yhat_lower yhat_upper
8 2020-06-21 207.886980 137.260222 277.544817
9 2020-06-28 215.891658 144.606357 287.614507
10 2020-07-05 223.896335 152.586583 293.597136
11 2020-07-12 231.901013 154.032672 304.878146
12 2020-07-19 239.905690 168.971432 311.714361
ITEM # B
CALCULATE FORECAST
ds yhat yhat_lower yhat_upper
8 2020-06-21 281.352232 228.094939 330.983604
9 2020-06-28 276.200199 221.798843 329.703188
10 2020-07-05 271.048166 216.131704 322.966893
11 2020-07-12 265.896133 214.037533 317.526028
12 2020-07-19 260.744100 209.923955 312.366033
ITEM # C
CALCULATE FORECAST
ds yhat yhat_lower yhat_upper
8 2020-06-21 38.015576 16.388462 58.753207
9 2020-06-28 37.578068 16.483414 60.281396
10 2020-07-05 37.140560 16.269026 58.592980
11 2020-07-12 36.703052 16.351184 58.890988
12 2020-07-19 36.265544 14.038544 56.053327
FORECAST DATAFRAME
Product ds yhat yhat_upper yhat_lower Product
0 C 2020-04-26 143.849560 215.773593 71.588250
1 C 2020-05-03 151.854238 223.179640 77.229544
2 C 2020-05-10 159.858915 231.388203 87.914720
3 C 2020-05-17 167.863593 243.705433 97.468648
4 C 2020-05-24 175.868270 247.448227 103.620476
.. ... ... ... ... ...
8 NaN 2020-06-21 38.015576 57.919222 17.838312
9 NaN 2020-06-28 37.578068 57.668158 15.740910
10 NaN 2020-07-05 37.140560 57.990138 16.508734
11 NaN 2020-07-12 36.703052 59.519944 17.110290
12 NaN 2020-07-19 36.265544 56.410552 15.599971
[117 rows x 5 columns]
这样的事情可能会奏效。
您还应该提及如何在名为 Product 的列中设置三个值(A、B C 的预测)。
for colname in ['A', 'B', 'C']:
dd = df.loc[:,['ds', colname]]
dd.columns=['ds', 'y']
m = Prophet()
m.fit(dd)
future = m.make_future_dataframe(periods=5, freq='W')
forecast = m.predict(future)
df['Forecast_'+colname] = forecast['yhat']
谢谢,卡特琳娜!
我不得不稍微修改一下,但这是对我有用的答案:
#将预测保存到数据框
预测['Item'] = col
#concat 所有结果在一个数据帧中
forecast_df = pd.concat([forecast_df, 预测[['Item','ds', 'Quantity_Ordered', 'yhat' , 'yhat_lower', 'yhat_upper']]], ignore_index=True)
我有以下代码为 3 种产品(A、B 和 C)创建时间序列预测。它将所有产品的预测保存到数据框中,forecast_df。但是,我无法弄清楚如何将产品名称与产品预测一起放在行中。它为产品名称创建一个列,但随后它为除 C 之外的所有产品放置 C,它称为 NaN。我怎样才能让它正确地输入产品名称?
import pandas as pd
import numpy as np
from datetime import date, timedelta, datetime
from fbprophet import Prophet
from pmdarima.model_selection import train_test_split
class color:
BOLD = '3[1m'
END = '3[0m'
df = pd.DataFrame({"ds": ['2020-4-26','2020-5-3','2020-5-10','2020-5-17','2020-5-24','2020-5-
31','2020-6-7','2020-6-14','2020-6-21','2020-6-28'],
"A": [164,127,157,127,170,322,133,176,233,257], "B":
[306,405,267,265,306,265,325,297,310,271],
"C": [23,41,75,24,48,31,51,26,41,43]})
df['ds'] = pd.to_datetime(df['ds'])
start_date = min(df['ds'])
end_date = max(df['ds'])
print(start_date, end_date)
train_len = int(week * 0.99)
print(train_len)
forecast_df = pd.DataFrame()
for col in df.columns[1:]:
print('\n', color.BOLD + 'ITEM #', col + color.END)
dfx = df[['ds', col]]
dfx = dfx.rename({col: 'y'}, axis=1)
#Train-Test-Split
y_train, y_test = train_test_split(dfx, train_size=train_len)
#Fit Model
m = Prophet()
m.fit(y_train)
#Calculate forecast
print('\n', color.BOLD + 'CALCULATE FORECAST' + color.END)
future = m.make_future_dataframe(periods=5, freq='W')
forecast = m.predict(future)
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail())
#save forecast to dataframe
for col in df.columns[1:]:
forecast_df['Product'] = col
#forecast_df = pd.concat((forecast_df, forecast[['ds', 'yhat','yhat_upper','yhat_lower']]))
forecast_df = forecast_df.append(forecast[['ds', 'yhat','yhat_upper','yhat_lower']])
print('\n', color.BOLD + 'FORECAST DATAFRAME' + color.END)
print(forecast_df)
这是输出:
ITEM # A
CALCULATE FORECAST
ds yhat yhat_lower yhat_upper
8 2020-06-21 207.886980 137.260222 277.544817
9 2020-06-28 215.891658 144.606357 287.614507
10 2020-07-05 223.896335 152.586583 293.597136
11 2020-07-12 231.901013 154.032672 304.878146
12 2020-07-19 239.905690 168.971432 311.714361
ITEM # B
CALCULATE FORECAST
ds yhat yhat_lower yhat_upper
8 2020-06-21 281.352232 228.094939 330.983604
9 2020-06-28 276.200199 221.798843 329.703188
10 2020-07-05 271.048166 216.131704 322.966893
11 2020-07-12 265.896133 214.037533 317.526028
12 2020-07-19 260.744100 209.923955 312.366033
ITEM # C
CALCULATE FORECAST
ds yhat yhat_lower yhat_upper
8 2020-06-21 38.015576 16.388462 58.753207
9 2020-06-28 37.578068 16.483414 60.281396
10 2020-07-05 37.140560 16.269026 58.592980
11 2020-07-12 36.703052 16.351184 58.890988
12 2020-07-19 36.265544 14.038544 56.053327
FORECAST DATAFRAME
Product ds yhat yhat_upper yhat_lower Product
0 C 2020-04-26 143.849560 215.773593 71.588250
1 C 2020-05-03 151.854238 223.179640 77.229544
2 C 2020-05-10 159.858915 231.388203 87.914720
3 C 2020-05-17 167.863593 243.705433 97.468648
4 C 2020-05-24 175.868270 247.448227 103.620476
.. ... ... ... ... ...
8 NaN 2020-06-21 38.015576 57.919222 17.838312
9 NaN 2020-06-28 37.578068 57.668158 15.740910
10 NaN 2020-07-05 37.140560 57.990138 16.508734
11 NaN 2020-07-12 36.703052 59.519944 17.110290
12 NaN 2020-07-19 36.265544 56.410552 15.599971
[117 rows x 5 columns]
这样的事情可能会奏效。 您还应该提及如何在名为 Product 的列中设置三个值(A、B C 的预测)。
for colname in ['A', 'B', 'C']:
dd = df.loc[:,['ds', colname]]
dd.columns=['ds', 'y']
m = Prophet()
m.fit(dd)
future = m.make_future_dataframe(periods=5, freq='W')
forecast = m.predict(future)
df['Forecast_'+colname] = forecast['yhat']
谢谢,卡特琳娜!
我不得不稍微修改一下,但这是对我有用的答案:
#将预测保存到数据框
预测['Item'] = col
#concat 所有结果在一个数据帧中
forecast_df = pd.concat([forecast_df, 预测[['Item','ds', 'Quantity_Ordered', 'yhat' , 'yhat_lower', 'yhat_upper']]], ignore_index=True)