如何在 python 中按 groupby 绘制移动平均线?

How to plot moving average by groupby in python?

我有一个关于通过分组依据绘制移动平均线的问题。我从 Kaggle https://www.kaggle.com/code/kp4920/s-p-500-stock-data-time-series-analysis/comments 获取了数据集。我通过应用以下条件提取了几行。

new_df_A = new_df[(new_df.Name == 'A')]
new_df_A.sort_values(by=['Name', 'Date'])

并且我尝试通过实施此代码来计算 30 天的移动平均线

for cols in new_df_A.columns:
    if cols not in ['Name', 'Date',]:
        new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)

我收到了这个警告错误

/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)
/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)
/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

而且当我尝试绘制图形时,它是空白的。有人可以帮我解决这个问题吗?

谢谢

要获取时间序列数据的移动平均值,需要以不同方式指定感兴趣的时间段:对于 30 天,请使用“30D”。并且由于它是 column-wise,我们使用 loc 来指定列。既然已经是单题了,就不用groupby了。为了创建图表,我使用了 pandas 可视化,这是最简单的方法。

df_A = new_df_A.copy()
df_A['Date'] = pd.to_datetime(df_A['Date'])
df_A.set_index('Date', inplace=True)

for cols in df_A.columns:
    if cols not in ['Name', 'Date',]:
        df_A['ma_'+cols] = df_A.loc[:,cols].rolling('30D').mean()

df_A.iloc[:,6:10].plot()

import seaborn as sns
sns.set(rc={'figure.figsize':(20,8)})

for cols in df_A.columns:
    if cols not in ['Name', 'Date', 'Open', 'High', 'Close']:
        sns.lineplot(x=df_A.index, y=df_A[cols])

#plt.show()