Pandas dataframe groupby / rolling - 为什么没有重置新组的滚动平均值?
Pandas dataframe groupby / rolling - why no reset of rolling mean on new group?
我正在尝试汇总一组人的工作时间,需要计算滚动平均值。
我可以用 df.groupby 和 df.rolling 来做到这一点,但是对于 'n' 值的滚动平均值,我希望组中的前 n-1 个值是 nan 或 0 .
示例 -
import pandas as pd
import numpy as np
employees = ['Alice', 'Alice', 'Bob', 'Bob', 'Bob' ]
weeks = [2, 3, 2, 3, 4]
hours = [5, 8, 4, 2, 5]
df = pd.DataFrame.from_dict({'employee' : employees,
'week': weeks,
'hours': hours})
df.groupby(['employee', 'week']).sum().rolling(2).mean()
df
employee hours week
0 Alice 5 2
1 Alice 8 3
2 Bob 4 2
3 Bob 2 3
4 Bob 5 4
结果 -
hours
employee week
Alice 2 NaN
3 6.5
Bob 2 6.0 <-- expect this to be 0
3 3.0
4 3.5
预期结果
hours
employee week
Alice 2 NaN
3 6.5
Bob 2 NaN <--- mean reset to 0 on new group
3 3.0
4 3.5
此重置(Bob 的第一行)不会发生。我怎样才能实现它?
非常感谢(和 apols 格式化)
你在找
s=df.groupby(['employee']).apply(lambda x : x['hours'].rolling(2).mean())
s
Out[225]:
employee
Alice 0 nan
1 6.50000
Bob 2 nan
3 3.00000
4 3.50000
Name: hours, dtype: float64
# assign it back
df['roll_mean']=s.reset_index(level=0,drop=True)
我正在尝试汇总一组人的工作时间,需要计算滚动平均值。
我可以用 df.groupby 和 df.rolling 来做到这一点,但是对于 'n' 值的滚动平均值,我希望组中的前 n-1 个值是 nan 或 0 .
示例 -
import pandas as pd
import numpy as np
employees = ['Alice', 'Alice', 'Bob', 'Bob', 'Bob' ]
weeks = [2, 3, 2, 3, 4]
hours = [5, 8, 4, 2, 5]
df = pd.DataFrame.from_dict({'employee' : employees,
'week': weeks,
'hours': hours})
df.groupby(['employee', 'week']).sum().rolling(2).mean()
df
employee hours week
0 Alice 5 2
1 Alice 8 3
2 Bob 4 2
3 Bob 2 3
4 Bob 5 4
结果 -
hours
employee week
Alice 2 NaN
3 6.5
Bob 2 6.0 <-- expect this to be 0
3 3.0
4 3.5
预期结果
hours
employee week
Alice 2 NaN
3 6.5
Bob 2 NaN <--- mean reset to 0 on new group
3 3.0
4 3.5
此重置(Bob 的第一行)不会发生。我怎样才能实现它?
非常感谢(和 apols 格式化)
你在找
s=df.groupby(['employee']).apply(lambda x : x['hours'].rolling(2).mean())
s
Out[225]:
employee
Alice 0 nan
1 6.50000
Bob 2 nan
3 3.00000
4 3.50000
Name: hours, dtype: float64
# assign it back
df['roll_mean']=s.reset_index(level=0,drop=True)