如何在 python 中按 groupby 绘制移动平均线?
How to plot moving average by groupby in python?
我有一个关于通过分组依据绘制移动平均线的问题。我从 Kaggle https://www.kaggle.com/code/kp4920/s-p-500-stock-data-time-series-analysis/comments 获取了数据集。我通过应用以下条件提取了几行。
new_df_A = new_df[(new_df.Name == 'A')]
new_df_A.sort_values(by=['Name', 'Date'])
并且我尝试通过实施此代码来计算 30 天的移动平均线
for cols in new_df_A.columns:
if cols not in ['Name', 'Date',]:
new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)
我收到了这个警告错误
/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)
/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)
/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
而且当我尝试绘制图形时,它是空白的。有人可以帮我解决这个问题吗?
谢谢
要获取时间序列数据的移动平均值,需要以不同方式指定感兴趣的时间段:对于 30 天,请使用“30D”。并且由于它是 column-wise,我们使用 loc
来指定列。既然已经是单题了,就不用groupby
了。为了创建图表,我使用了 pandas 可视化,这是最简单的方法。
df_A = new_df_A.copy()
df_A['Date'] = pd.to_datetime(df_A['Date'])
df_A.set_index('Date', inplace=True)
for cols in df_A.columns:
if cols not in ['Name', 'Date',]:
df_A['ma_'+cols] = df_A.loc[:,cols].rolling('30D').mean()
df_A.iloc[:,6:10].plot()
import seaborn as sns
sns.set(rc={'figure.figsize':(20,8)})
for cols in df_A.columns:
if cols not in ['Name', 'Date', 'Open', 'High', 'Close']:
sns.lineplot(x=df_A.index, y=df_A[cols])
#plt.show()
我有一个关于通过分组依据绘制移动平均线的问题。我从 Kaggle https://www.kaggle.com/code/kp4920/s-p-500-stock-data-time-series-analysis/comments 获取了数据集。我通过应用以下条件提取了几行。
new_df_A = new_df[(new_df.Name == 'A')]
new_df_A.sort_values(by=['Name', 'Date'])
并且我尝试通过实施此代码来计算 30 天的移动平均线
for cols in new_df_A.columns:
if cols not in ['Name', 'Date',]:
new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)
我收到了这个警告错误
/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)
/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)
/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
而且当我尝试绘制图形时,它是空白的。有人可以帮我解决这个问题吗?
谢谢
要获取时间序列数据的移动平均值,需要以不同方式指定感兴趣的时间段:对于 30 天,请使用“30D”。并且由于它是 column-wise,我们使用 loc
来指定列。既然已经是单题了,就不用groupby
了。为了创建图表,我使用了 pandas 可视化,这是最简单的方法。
df_A = new_df_A.copy()
df_A['Date'] = pd.to_datetime(df_A['Date'])
df_A.set_index('Date', inplace=True)
for cols in df_A.columns:
if cols not in ['Name', 'Date',]:
df_A['ma_'+cols] = df_A.loc[:,cols].rolling('30D').mean()
df_A.iloc[:,6:10].plot()
import seaborn as sns
sns.set(rc={'figure.figsize':(20,8)})
for cols in df_A.columns:
if cols not in ['Name', 'Date', 'Open', 'High', 'Close']:
sns.lineplot(x=df_A.index, y=df_A[cols])
#plt.show()