基于列枚举数据框

Enumerating dataframe based on a column

我正在处理一个时间序列数据框,它看起来像这样,除了有超过数千行。我想创建一个新列来枚举具有相同值 'sign' 的行块。即第 0 行将是 0,第 1 行到第 23 行将是 1,第 24 行到第 30 行将是 2 等等......(时间顺序很重要)什么是最 pythonic 的方式来完成这个?提前谢谢你

    Date       sign
0   2011-01-27  1
1   2011-01-28  -1
2   2011-01-31  -1
3   2011-02-01  -1
4   2011-02-02  -1
5   2011-02-07  -1
6   2011-02-08  -1
7   2011-02-09  -1
8   2011-02-10  -1
9   2011-02-11  -1
10  2011-02-14  -1
11  2011-02-15  -1
12  2011-02-16  -1
13  2011-02-17  -1
14  2011-02-18  -1
15  2011-02-21  -1
16  2011-02-22  -1
17  2011-02-23  -1
18  2011-02-24  -1
19  2011-02-25  -1
20  2011-02-28  -1
21  2011-03-01  -1
22  2011-03-02  -1
23  2011-03-03  -1
24  2011-03-04  1
25  2011-03-07  1
26  2011-03-08  1
27  2011-03-09  1
28  2011-03-10  1
29  2011-03-11  1
30  2011-03-14  1
31  2011-03-15  -1
32  2011-03-16  -1
33  2011-03-17  -1
34  2011-03-18  -1
35  2011-03-21  -1
36  2011-03-22  -1
37  2011-03-23  -1
38  2011-03-24  -1
39  2011-03-25  -1
40  2011-03-28  -1
41  2011-03-29  1
42  2011-03-30  1

你可以这样做:

df['count'] = df.sign.ne(df.sign.shift(1)).cumsum()

  Date  sign  count
0   2011-01-27     1      1
1   2011-01-28    -1      2
2   2011-01-31    -1      2
3   2011-02-01    -1      2
4   2011-02-02    -1      2
5   2011-02-07    -1      2
.
.
.
23  2011-03-03    -1      2
24  2011-03-04     1      3
25  2011-03-07     1      3
26  2011-03-08     1      3
27  2011-03-09     1      3

你可以得到符号变化的cumsum,使用diff获得:

df['new_column'] = (df.sign.diff()!=0).cumsum()-1

>>> df
          Date  sign  new_column
0   2011-01-27     1      0
1   2011-01-28    -1      1
2   2011-01-31    -1      1
3   2011-02-01    -1      1
4   2011-02-02    -1      1
5   2011-02-07    -1      1
6   2011-02-08    -1      1
7   2011-02-09    -1      1
8   2011-02-10    -1      1
9   2011-02-11    -1      1
10  2011-02-14    -1      1
11  2011-02-15    -1      1
12  2011-02-16    -1      1
13  2011-02-17    -1      1
14  2011-02-18    -1      1
15  2011-02-21    -1      1
16  2011-02-22    -1      1
17  2011-02-23    -1      1
18  2011-02-24    -1      1
19  2011-02-25    -1      1
20  2011-02-28    -1      1
21  2011-03-01    -1      1
22  2011-03-02    -1      1
23  2011-03-03    -1      1
24  2011-03-04     1      2
25  2011-03-07     1      2
26  2011-03-08     1      2
27  2011-03-09     1      2
28  2011-03-10     1      2
29  2011-03-11     1      2
30  2011-03-14     1      2
31  2011-03-15    -1      3
32  2011-03-16    -1      3
33  2011-03-17    -1      3
34  2011-03-18    -1      3
35  2011-03-21    -1      3
36  2011-03-22    -1      3
37  2011-03-23    -1      3
38  2011-03-24    -1      3
39  2011-03-25    -1      3
40  2011-03-28    -1      3
41  2011-03-29     1      4
42  2011-03-30     1      4