时间序列分析For循环Python
Time series analysis For loop Python
我正在尝试使预测 (1) 每个州的总需求和 (2) 每个州每个客户的需求的过程自动化。应用的统计方法是移动平均。预测时间为 1 个月 ahead.The 数据从具有 5 列的 excel sheet 导入:客户、州、产品、数量、订单日期。 excel 文件可以通过 link https://drive.google.com/file/d/1JlIqWl8bfyJ3Io01Zx088GIAC6rRuCa8/view?usp=sharing
找到
一个客户可以与不同的州相关联,例如,Aaron Bergman 可以从华盛顿、得克萨斯州和俄克拉荷马州的商店购买椅子、艺术品 Phone。其他客户有相同的购买行为。对于 (1) 我尝试使用 For 循环,但它没有用。错误是Order_Date not in index
df = pd.read_excel("Sales_data.xlsx")
State_Name = df.State.unique()
Customer_Name = df.Customer.unique()
for x in State_Name:
df = df[['Order_Date', 'Quantity']]
df['Order_Date'].min(), df['Order_Date'].max()
df.isnull().sum()
df.Timestamp = pd.to_datetime(df.Order_Date, format= '%D-%M-%Y %H:%m')
df.index = df.Timestamp
df = df.resample('MS').sum()
rolling_mean = df.Quantity.rolling(window=10).mean()
考虑将 for
循环行转换为定义的方法并使用 groupby
到 return 时间序列调用它。此外,请注意 pandas
:
中的最佳实践
- 避免referencing columns as attributes with period qualifiers。相反,使用括号
[]
.
- 避免使用
[]
列表 column subsetting。相反,使用 reindex
.
def rollmean_func(df):
# BETTER COLUMN SUBSET
df = df.reindex(['Order_Date', 'Quantity'], axis='columns')
# BETTER COLUMN ASSIGNMENT
df['Timestamp'] = pd.to_datetime(df['Order_Date'], format= '%D-%M-%Y %H:%m')
df.index = df['Timestamp']
df = df.resample('MS').sum()
rolling_mean = df['Quantity'].rolling(window=10).mean()
return rolling_mean
州级
state_rollmeans = df.groupby(['State']).apply(rollmean_func)
state_rollmeans
# State Timestamp
# Alabama 2014-04-01 NaN
# 2014-05-01 NaN
# 2014-06-01 NaN
# 2014-07-01 NaN
# 2014-08-01 NaN
# ...
# Wisconsin 2017-09-01 10.6
# 2017-10-01 7.5
# 2017-11-01 9.7
# 2017-12-01 12.3
# Wyoming 2016-11-01 NaN
# Name: Quantity, Length: 2070, dtype: float64
客户级别
customer_rollmeans = df.groupby(['Customer_Name']).apply(rollmean_func)
customer_rollmeans
# Customer_Name Timestamp
# Aaron Bergman 2014-02-01 NaN
# 2014-03-01 NaN
# 2014-04-01 NaN
# 2014-05-01 NaN
# 2014-06-01 NaN
# ...
# Zuschuss Donatelli 2017-02-01 1.2
# 2017-03-01 0.7
# 2017-04-01 0.7
# 2017-05-01 0.0
# 2017-06-01 0.3
# Name: Quantity, Length: 26818, dtype: float64
我正在尝试使预测 (1) 每个州的总需求和 (2) 每个州每个客户的需求的过程自动化。应用的统计方法是移动平均。预测时间为 1 个月 ahead.The 数据从具有 5 列的 excel sheet 导入:客户、州、产品、数量、订单日期。 excel 文件可以通过 link https://drive.google.com/file/d/1JlIqWl8bfyJ3Io01Zx088GIAC6rRuCa8/view?usp=sharing
找到一个客户可以与不同的州相关联,例如,Aaron Bergman 可以从华盛顿、得克萨斯州和俄克拉荷马州的商店购买椅子、艺术品 Phone。其他客户有相同的购买行为。对于 (1) 我尝试使用 For 循环,但它没有用。错误是Order_Date not in index
df = pd.read_excel("Sales_data.xlsx")
State_Name = df.State.unique()
Customer_Name = df.Customer.unique()
for x in State_Name:
df = df[['Order_Date', 'Quantity']]
df['Order_Date'].min(), df['Order_Date'].max()
df.isnull().sum()
df.Timestamp = pd.to_datetime(df.Order_Date, format= '%D-%M-%Y %H:%m')
df.index = df.Timestamp
df = df.resample('MS').sum()
rolling_mean = df.Quantity.rolling(window=10).mean()
考虑将 for
循环行转换为定义的方法并使用 groupby
到 return 时间序列调用它。此外,请注意 pandas
:
- 避免referencing columns as attributes with period qualifiers。相反,使用括号
[]
. - 避免使用
[]
列表 column subsetting。相反,使用reindex
.
def rollmean_func(df):
# BETTER COLUMN SUBSET
df = df.reindex(['Order_Date', 'Quantity'], axis='columns')
# BETTER COLUMN ASSIGNMENT
df['Timestamp'] = pd.to_datetime(df['Order_Date'], format= '%D-%M-%Y %H:%m')
df.index = df['Timestamp']
df = df.resample('MS').sum()
rolling_mean = df['Quantity'].rolling(window=10).mean()
return rolling_mean
州级
state_rollmeans = df.groupby(['State']).apply(rollmean_func)
state_rollmeans
# State Timestamp
# Alabama 2014-04-01 NaN
# 2014-05-01 NaN
# 2014-06-01 NaN
# 2014-07-01 NaN
# 2014-08-01 NaN
# ...
# Wisconsin 2017-09-01 10.6
# 2017-10-01 7.5
# 2017-11-01 9.7
# 2017-12-01 12.3
# Wyoming 2016-11-01 NaN
# Name: Quantity, Length: 2070, dtype: float64
客户级别
customer_rollmeans = df.groupby(['Customer_Name']).apply(rollmean_func)
customer_rollmeans
# Customer_Name Timestamp
# Aaron Bergman 2014-02-01 NaN
# 2014-03-01 NaN
# 2014-04-01 NaN
# 2014-05-01 NaN
# 2014-06-01 NaN
# ...
# Zuschuss Donatelli 2017-02-01 1.2
# 2017-03-01 0.7
# 2017-04-01 0.7
# 2017-05-01 0.0
# 2017-06-01 0.3
# Name: Quantity, Length: 26818, dtype: float64