Pandas：计算每个"year"的总列值的标准

Question

我有一个数据框代表餐厅的顾客签到（访问）。 year 就是餐厅签到发生的那一年。

我想要做的是在我的初始 Dataframe df 中添加一列 std_checkin，代表每年访问的标准差。所以，我需要计算每年总访问量的标准差。

data = {
        'restaurant_id':  ['--1UhMGODdWsrMastO9DZw', '--1UhMGODdWsrMastO9DZw','--1UhMGODdWsrMastO9DZw','--1UhMGODdWsrMastO9DZw','--1UhMGODdWsrMastO9DZw','--1UhMGODdWsrMastO9DZw','--6MefnULPED_I942VcFNA','--6MefnULPED_I942VcFNA','--6MefnULPED_I942VcFNA','--6MefnULPED_I942VcFNA'],
        'year': ['2016','2016','2016','2016','2017','2017','2011','2011','2012','2012'],
        }
df = pd.DataFrame (data, columns = ['restaurant_id','year'])

# total number of checkins per restaurant
d = df.groupby('restaurant_id')['year'].count().to_dict()
df['nb_checkin'] = df['restaurant_id'].map(d)


grouped = df.groupby(["restaurant_id"])
avg_annual_visits = grouped["year"].count() / grouped["year"].nunique()
avg_annual_visits = avg_annual_visits.rename("avg_annual_visits")
df = df.merge(avg_annual_visits, left_on="restaurant_id", right_index=True)

df.head(10)

从这里开始，我不确定如何使用 pandas 编写我想要的内容。如果需要任何说明，请询问。

谢谢！

Answer 1

我想你想做的是：

counts = df.groupby('restaurant_id')['year'].value_counts()
counts.std(level='restaurant_id')

counts 的输出，即每家餐厅每年的总访问量：

restaurant_id           year
--1UhMGODdWsrMastO9DZw  2016    4
                        2017    2
--6MefnULPED_I942VcFNA  2011    2
                        2012    2
Name: year, dtype: int64

和 std

的输出

restaurant_id
--1UhMGODdWsrMastO9DZw    1.414214
--6MefnULPED_I942VcFNA    0.000000
Name: year, dtype: float64

Pandas：计算每个"year"的总列值的标准

Pandas: calculate the std of total column value per "year"

python

dataframe

standard-deviation

pandas

feature-engineering