Groupby Pandas & 权重时间序列
Groupby Pandas & Weight Time Series
我正在尝试通过“plant_name”和“月份”的分组来计算每个时间序列或数据值的权重,但我看不到解决方案。我有这样的数据 -
plant_name month adjusted_wspd
0 ARIZONA I 1 7.62
1 ARIZONA I 2 7.37
2 ARIZONA I 3 6.72
3 ARIZONA I 4 6.67
4 ARIZONA I 5 7.07
5 ARIZONA I 6 7.53
6 ARIZONA I 7 8.09
7 ARIZONA I 8 8.83
8 ARIZONA I 9 9.41
9 ARIZONA I 10 9.44
10 ARIZONA I 11 8.92
11 ARIZONA I 12 8.37
12 CAETITE I 1 7.10
13 CAETITE I 2 7.27
14 CAETITE I 3 7.26
我已经尝试编写一个函数并使用“应用”到定义的函数,但我得到了一个看起来像这样的错误解决方案,因为该函数可能不正确地进行了数学运算:
ARIZONA I 1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 1.0
7 1.0
8 1.0
9 1.0
10 1.0
11 1.0
12 1.0
CAETITE I 1 1.0
2 1.0
3 1.0
我的代码如下所示:
def my_agg(x):
names = {'Weigths_WSPD': (x['adjusted_wspd'] )/x['adjusted_wspd'].sum()}
return pd.Series(names, index=['Weights_WSPD'])
dfmm.groupby(["plant_name", "month"]).apply(my_agg)
我的答案应该如下所示(百分比 (%)):
ARIZONA I 1 7.93
2 7.67
3 7.00
4 6.95
5 7.36
6 7.84
7 8.42
8 9.19
9 9.80
10 9.83
11 9.29
12 8.72
CAETITE I 1 33.82
2 33.61
3 33.56
谢谢!
让我们尝试 groupby transform
求和 plant_name
而不是 plant_name
和 month
:
df['Weigths_WSPD'] = (
df['adjusted_wspd'] /
df.groupby('plant_name')['adjusted_wspd'].transform('sum') *
100
).round(2)
plant_name month adjusted_wspd Weigths_WSPD
0 ARIZONA I 1 7.62 7.93
1 ARIZONA I 2 7.37 7.67
2 ARIZONA I 3 6.72 7.00
3 ARIZONA I 4 6.67 6.95
4 ARIZONA I 5 7.07 7.36
5 ARIZONA I 6 7.53 7.84
6 ARIZONA I 7 8.09 8.42
7 ARIZONA I 8 8.83 9.19
8 ARIZONA I 9 9.41 9.80
9 ARIZONA I 10 9.44 9.83
10 ARIZONA I 11 8.92 9.29
11 ARIZONA I 12 8.37 8.72
12 CAETITE I 1 7.10 32.82
13 CAETITE I 2 7.27 33.61
14 CAETITE I 3 7.26 33.56
如果需要 Series
:
s = df.set_index(['plant_name', 'month'])
s = (
s['adjusted_wspd'] /
s.groupby(level=0)['adjusted_wspd'].transform('sum') *
100
).round(2).rename('Weigths_WSPD')
plant_name month
ARIZONA I 1 7.93
2 7.67
3 7.00
4 6.95
5 7.36
6 7.84
7 8.42
8 9.19
9 9.80
10 9.83
11 9.29
12 8.72
CAETITE I 1 32.82
2 33.61
3 33.56
Name: Weigths_WSPD, dtype: float64
DataFrame 构造函数:
df = pd.DataFrame({
'plant_name': {0: 'ARIZONA I', 1: 'ARIZONA I', 2: 'ARIZONA I',
3: 'ARIZONA I', 4: 'ARIZONA I', 5: 'ARIZONA I',
6: 'ARIZONA I', 7: 'ARIZONA I', 8: 'ARIZONA I',
9: 'ARIZONA I', 10: 'ARIZONA I', 11: 'ARIZONA I',
12: 'CAETITE I', 13: 'CAETITE I', 14: 'CAETITE I'},
'month': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10,
10: 11, 11: 12, 12: 1, 13: 2, 14: 3},
'adjusted_wspd': {0: 7.62, 1: 7.37, 2: 6.72, 3: 6.67, 4: 7.07, 5: 7.53,
6: 8.09, 7: 8.83, 8: 9.41, 9: 9.44, 10: 8.92, 11: 8.37,
12: 7.1, 13: 7.27, 14: 7.26}
})
我正在尝试通过“plant_name”和“月份”的分组来计算每个时间序列或数据值的权重,但我看不到解决方案。我有这样的数据 -
plant_name month adjusted_wspd
0 ARIZONA I 1 7.62
1 ARIZONA I 2 7.37
2 ARIZONA I 3 6.72
3 ARIZONA I 4 6.67
4 ARIZONA I 5 7.07
5 ARIZONA I 6 7.53
6 ARIZONA I 7 8.09
7 ARIZONA I 8 8.83
8 ARIZONA I 9 9.41
9 ARIZONA I 10 9.44
10 ARIZONA I 11 8.92
11 ARIZONA I 12 8.37
12 CAETITE I 1 7.10
13 CAETITE I 2 7.27
14 CAETITE I 3 7.26
我已经尝试编写一个函数并使用“应用”到定义的函数,但我得到了一个看起来像这样的错误解决方案,因为该函数可能不正确地进行了数学运算:
ARIZONA I 1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 1.0
7 1.0
8 1.0
9 1.0
10 1.0
11 1.0
12 1.0
CAETITE I 1 1.0
2 1.0
3 1.0
我的代码如下所示:
def my_agg(x):
names = {'Weigths_WSPD': (x['adjusted_wspd'] )/x['adjusted_wspd'].sum()}
return pd.Series(names, index=['Weights_WSPD'])
dfmm.groupby(["plant_name", "month"]).apply(my_agg)
我的答案应该如下所示(百分比 (%)):
ARIZONA I 1 7.93
2 7.67
3 7.00
4 6.95
5 7.36
6 7.84
7 8.42
8 9.19
9 9.80
10 9.83
11 9.29
12 8.72
CAETITE I 1 33.82
2 33.61
3 33.56
谢谢!
让我们尝试 groupby transform
求和 plant_name
而不是 plant_name
和 month
:
df['Weigths_WSPD'] = (
df['adjusted_wspd'] /
df.groupby('plant_name')['adjusted_wspd'].transform('sum') *
100
).round(2)
plant_name month adjusted_wspd Weigths_WSPD
0 ARIZONA I 1 7.62 7.93
1 ARIZONA I 2 7.37 7.67
2 ARIZONA I 3 6.72 7.00
3 ARIZONA I 4 6.67 6.95
4 ARIZONA I 5 7.07 7.36
5 ARIZONA I 6 7.53 7.84
6 ARIZONA I 7 8.09 8.42
7 ARIZONA I 8 8.83 9.19
8 ARIZONA I 9 9.41 9.80
9 ARIZONA I 10 9.44 9.83
10 ARIZONA I 11 8.92 9.29
11 ARIZONA I 12 8.37 8.72
12 CAETITE I 1 7.10 32.82
13 CAETITE I 2 7.27 33.61
14 CAETITE I 3 7.26 33.56
如果需要 Series
:
s = df.set_index(['plant_name', 'month'])
s = (
s['adjusted_wspd'] /
s.groupby(level=0)['adjusted_wspd'].transform('sum') *
100
).round(2).rename('Weigths_WSPD')
plant_name month
ARIZONA I 1 7.93
2 7.67
3 7.00
4 6.95
5 7.36
6 7.84
7 8.42
8 9.19
9 9.80
10 9.83
11 9.29
12 8.72
CAETITE I 1 32.82
2 33.61
3 33.56
Name: Weigths_WSPD, dtype: float64
DataFrame 构造函数:
df = pd.DataFrame({
'plant_name': {0: 'ARIZONA I', 1: 'ARIZONA I', 2: 'ARIZONA I',
3: 'ARIZONA I', 4: 'ARIZONA I', 5: 'ARIZONA I',
6: 'ARIZONA I', 7: 'ARIZONA I', 8: 'ARIZONA I',
9: 'ARIZONA I', 10: 'ARIZONA I', 11: 'ARIZONA I',
12: 'CAETITE I', 13: 'CAETITE I', 14: 'CAETITE I'},
'month': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10,
10: 11, 11: 12, 12: 1, 13: 2, 14: 3},
'adjusted_wspd': {0: 7.62, 1: 7.37, 2: 6.72, 3: 6.67, 4: 7.07, 5: 7.53,
6: 8.09, 7: 8.83, 8: 9.41, 9: 9.44, 10: 8.92, 11: 8.37,
12: 7.1, 13: 7.27, 14: 7.26}
})