pandas 数据框中的组重复模式
Group repeating pattern in pandas Dataframe
所以我有一个 Dataframe,它有一个重复的数字系列,我想像这样分组:
Number Pattern
Value
Desired Group
Value.1
1
723
1
Max of Group
2
400
1
Max of Group
8
235
1
Max of Group
5
387
2
Max of Group
7
911
2
Max of Group
3
365
3
Max of Group
4
270
3
Max of Group
5
194
3
Max of Group
7
452
3
Max of Group
100
716
4
Max of Group
104
69
4
Max of Group
2
846
5
Max of Group
3
474
5
Max of Group
4
524
5
Max of Group
所以基本上数字模式总是单调递增的。
有什么想法吗?
您可以将 Number Pattern
by 1
与累计总和 Series.cumsum
and then is used GroupBy.transform
与 max
进行比较:
df['Desired Group'] = df['Number Pattern'].eq(1).cumsum()
df['Value.1'] = df.groupby('Desired Group')['Value'].transform('max')
print (df)
Number Pattern Value Desired Group Value.1
0 1 723 1 723
1 2 400 1 723
2 3 235 1 723
3 1 387 2 911
4 2 911 2 911
5 1 365 3 452
6 2 270 3 452
7 3 194 3 452
8 4 452 3 452
9 1 716 4 716
10 2 69 4 716
11 1 846 5 846
12 2 474 5 846
13 3 524 5 846
单调递增使用:
df['Desired Group'] = (~df['Number Pattern'].diff().gt(0)).cumsum()
所以我有一个 Dataframe,它有一个重复的数字系列,我想像这样分组:
Number Pattern | Value | Desired Group | Value.1 |
---|---|---|---|
1 | 723 | 1 | Max of Group |
2 | 400 | 1 | Max of Group |
8 | 235 | 1 | Max of Group |
5 | 387 | 2 | Max of Group |
7 | 911 | 2 | Max of Group |
3 | 365 | 3 | Max of Group |
4 | 270 | 3 | Max of Group |
5 | 194 | 3 | Max of Group |
7 | 452 | 3 | Max of Group |
100 | 716 | 4 | Max of Group |
104 | 69 | 4 | Max of Group |
2 | 846 | 5 | Max of Group |
3 | 474 | 5 | Max of Group |
4 | 524 | 5 | Max of Group |
所以基本上数字模式总是单调递增的。
有什么想法吗?
您可以将 Number Pattern
by 1
与累计总和 Series.cumsum
and then is used GroupBy.transform
与 max
进行比较:
df['Desired Group'] = df['Number Pattern'].eq(1).cumsum()
df['Value.1'] = df.groupby('Desired Group')['Value'].transform('max')
print (df)
Number Pattern Value Desired Group Value.1
0 1 723 1 723
1 2 400 1 723
2 3 235 1 723
3 1 387 2 911
4 2 911 2 911
5 1 365 3 452
6 2 270 3 452
7 3 194 3 452
8 4 452 3 452
9 1 716 4 716
10 2 69 4 716
11 1 846 5 846
12 2 474 5 846
13 3 524 5 846
单调递增使用:
df['Desired Group'] = (~df['Number Pattern'].diff().gt(0)).cumsum()