确定按 'Year' 分组的出现次数 - 继续
Identify number of occurrence grouped by 'Year' - continued
这是 的延续。
现在,我想将这些事件按年份分组,这样输出就会变成这样:
Combo Occurrence (2017) Occurrence (2018) Occurrence (2019) Occurrence (2020)
0 DK,NO4 2 x x x
1 DK1,NO1 1 x x x
2 DK,NO1,NO2 1 x x x
3 DK,NO 1 x x x
输入数据如下所示:
Year Month Day Weekday NO1 ... SE3 SE4 FI DK1 DK2
0 2017 1 1 7 28.4 ... 24.03 24.03 24.03 20.96 20.96
1 2017 1 1 7 28.2 ... 25.05 25.05 25.05 25.05 25.05
2 2017 1 1 7 28.0 ... 25.05 25.05 25.05 25.05 25.05
3 2017 1 1 7 28.0 ... 23.19 23.19 23.19 16.03 16.03
4 2017 1 1 7 28.0 ... 24.10 24.10 24.10 16.43 16.43
... ... ... ... ... ... ... ... ... ... ... ...
35063 2020 12 31 4 31.0 ... 31.00 58.04 35.32 89.35 89.35
35064 2020 12 31 4 24.8 ... 24.84 54.45 24.84 56.70 56.70
35065 2020 12 31 4 24.8 ... 24.77 51.18 28.00 52.44 52.44
35066 2020 12 31 4 24.6 ... 24.61 45.84 26.55 51.86 51.86
35067 2020 12 31 4 24.1 ... 24.07 24.07 24.07 78.66 78.66
想法是将所有没有处理字符串的列转换为 index
,然后 SeriesGroupBy.value_counts
用于计算每年的计数值:
df = df.set_index(['Year','Month','Day','Weekday'])
df = (df.eq(df['DK1 Up'], axis=0)
.dot(df.columns + ',')
.str[:-1]
.to_frame('Combo')
.groupby('Year')['Combo']
.value_counts()
.unstack(0, fill_value=0)
.add_prefix('Occurrence ')
.rename_axis(columns=None)
.reset_index()
)
print (df)
Combo Occurrence 2017 Occurrence 2020
0 DK1 Up,DK1 Down,DK2 Up,DK2 Down 2 0
1 DK1 Up,DK2 Up 3 5
这是
现在,我想将这些事件按年份分组,这样输出就会变成这样:
Combo Occurrence (2017) Occurrence (2018) Occurrence (2019) Occurrence (2020)
0 DK,NO4 2 x x x
1 DK1,NO1 1 x x x
2 DK,NO1,NO2 1 x x x
3 DK,NO 1 x x x
输入数据如下所示:
Year Month Day Weekday NO1 ... SE3 SE4 FI DK1 DK2
0 2017 1 1 7 28.4 ... 24.03 24.03 24.03 20.96 20.96
1 2017 1 1 7 28.2 ... 25.05 25.05 25.05 25.05 25.05
2 2017 1 1 7 28.0 ... 25.05 25.05 25.05 25.05 25.05
3 2017 1 1 7 28.0 ... 23.19 23.19 23.19 16.03 16.03
4 2017 1 1 7 28.0 ... 24.10 24.10 24.10 16.43 16.43
... ... ... ... ... ... ... ... ... ... ... ...
35063 2020 12 31 4 31.0 ... 31.00 58.04 35.32 89.35 89.35
35064 2020 12 31 4 24.8 ... 24.84 54.45 24.84 56.70 56.70
35065 2020 12 31 4 24.8 ... 24.77 51.18 28.00 52.44 52.44
35066 2020 12 31 4 24.6 ... 24.61 45.84 26.55 51.86 51.86
35067 2020 12 31 4 24.1 ... 24.07 24.07 24.07 78.66 78.66
想法是将所有没有处理字符串的列转换为 index
,然后 SeriesGroupBy.value_counts
用于计算每年的计数值:
df = df.set_index(['Year','Month','Day','Weekday'])
df = (df.eq(df['DK1 Up'], axis=0)
.dot(df.columns + ',')
.str[:-1]
.to_frame('Combo')
.groupby('Year')['Combo']
.value_counts()
.unstack(0, fill_value=0)
.add_prefix('Occurrence ')
.rename_axis(columns=None)
.reset_index()
)
print (df)
Combo Occurrence 2017 Occurrence 2020
0 DK1 Up,DK1 Down,DK2 Up,DK2 Down 2 0
1 DK1 Up,DK2 Up 3 5