下一行值更改时的整数计数器
Integer Counter when value in next row changes
我在我的数据框中添加“计数器列”时遇到问题。
我将多个列的值解析为所谓的“merged_attributes”,现在我想创建一个计数器,当“合并属性”列的值发生变化时该计数器递增 1。
我有以下数据框,最后一列是所需列:
Price UNIQUE MERGED_ATRIBUTE COUNTER
0 52.08 1 52.081 1
1 52.08 1 52.081 1
2 52.20 1 52.210 2
3 52.20 1 52.210 2
4 52.20 1 52.210 2
5 52.20 1 52.210 2
6 52.20 1 52.210 2
7 52.20 1 52.210 2
8 70.10 1 70.110 3
如何实现?
非常感谢!
尝试以下操作:
df['COUNTER'] = df.groupby('MERGED_ATRIBUTE').ngroup() + 1
这为 MERGED_ATRIBUTE
的每个值创建一个组,然后使用 GroupBy.ngroup:
Number each group from 0 to the number of groups - 1.
这个 returns 以下 DataFrame:
Price UNIQUE MERGED_ATRIBUTE COUNTER
0 52.08 1 52.081 1
1 52.08 1 52.081 1
2 52.20 1 52.210 2
3 52.20 1 52.210 2
4 52.20 1 52.210 2
5 52.20 1 52.210 2
6 52.20 1 52.210 2
7 52.20 1 52.210 2
8 70.10 1 70.110 3
请注意,这为每个属性分配了一个唯一编号,因此如果 MERGED_ATRIBUTE
未排序,则此答案与@sophocles 的答案不同:
>>> df2
Price UNIQUE MERGED_ATRIBUTE
0 52.08 1 52.081
1 70.10 1 70.110
2 52.08 1 52.081
>>> df2.groupby('MERGED_ATRIBUTE').ngroup() + 1
0 1
1 2
2 1
dtype: int64
>>> df2['MERGED_ATRIBUTE'].ne(df2['MERGED_ATRIBUTE'].shift()).cumsum()
0 1
1 2
2 3
Name: MERGED_ATRIBUTE, dtype: int64
或使用 cumsum()
并与前一行进行相等性检查:
c = df['MERGED_ATRIBUTE']
df['COUNTER'] = c.ne(c.shift()).cumsum()
Price UNIQUE MERGED_ATRIBUTE COUNTER
0 52.08 1 52.081 1
1 52.08 1 52.081 1
2 52.20 1 52.210 2
3 52.20 1 52.210 2
4 52.20 1 52.210 2
5 52.20 1 52.210 2
6 52.20 1 52.210 2
7 52.20 1 52.210 2
8 70.10 1 70.110 3
我在我的数据框中添加“计数器列”时遇到问题。
我将多个列的值解析为所谓的“merged_attributes”,现在我想创建一个计数器,当“合并属性”列的值发生变化时该计数器递增 1。
我有以下数据框,最后一列是所需列:
Price UNIQUE MERGED_ATRIBUTE COUNTER
0 52.08 1 52.081 1
1 52.08 1 52.081 1
2 52.20 1 52.210 2
3 52.20 1 52.210 2
4 52.20 1 52.210 2
5 52.20 1 52.210 2
6 52.20 1 52.210 2
7 52.20 1 52.210 2
8 70.10 1 70.110 3
如何实现?
非常感谢!
尝试以下操作:
df['COUNTER'] = df.groupby('MERGED_ATRIBUTE').ngroup() + 1
这为 MERGED_ATRIBUTE
的每个值创建一个组,然后使用 GroupBy.ngroup:
Number each group from 0 to the number of groups - 1.
这个 returns 以下 DataFrame:
Price UNIQUE MERGED_ATRIBUTE COUNTER
0 52.08 1 52.081 1
1 52.08 1 52.081 1
2 52.20 1 52.210 2
3 52.20 1 52.210 2
4 52.20 1 52.210 2
5 52.20 1 52.210 2
6 52.20 1 52.210 2
7 52.20 1 52.210 2
8 70.10 1 70.110 3
请注意,这为每个属性分配了一个唯一编号,因此如果 MERGED_ATRIBUTE
未排序,则此答案与@sophocles 的答案不同:
>>> df2
Price UNIQUE MERGED_ATRIBUTE
0 52.08 1 52.081
1 70.10 1 70.110
2 52.08 1 52.081
>>> df2.groupby('MERGED_ATRIBUTE').ngroup() + 1
0 1
1 2
2 1
dtype: int64
>>> df2['MERGED_ATRIBUTE'].ne(df2['MERGED_ATRIBUTE'].shift()).cumsum()
0 1
1 2
2 3
Name: MERGED_ATRIBUTE, dtype: int64
或使用 cumsum()
并与前一行进行相等性检查:
c = df['MERGED_ATRIBUTE']
df['COUNTER'] = c.ne(c.shift()).cumsum()
Price UNIQUE MERGED_ATRIBUTE COUNTER
0 52.08 1 52.081 1
1 52.08 1 52.081 1
2 52.20 1 52.210 2
3 52.20 1 52.210 2
4 52.20 1 52.210 2
5 52.20 1 52.210 2
6 52.20 1 52.210 2
7 52.20 1 52.210 2
8 70.10 1 70.110 3