基于列枚举数据框
Enumerating dataframe based on a column
我正在处理一个时间序列数据框,它看起来像这样,除了有超过数千行。我想创建一个新列来枚举具有相同值 'sign' 的行块。即第 0 行将是 0,第 1 行到第 23 行将是 1,第 24 行到第 30 行将是 2 等等......(时间顺序很重要)什么是最 pythonic 的方式来完成这个?提前谢谢你
Date sign
0 2011-01-27 1
1 2011-01-28 -1
2 2011-01-31 -1
3 2011-02-01 -1
4 2011-02-02 -1
5 2011-02-07 -1
6 2011-02-08 -1
7 2011-02-09 -1
8 2011-02-10 -1
9 2011-02-11 -1
10 2011-02-14 -1
11 2011-02-15 -1
12 2011-02-16 -1
13 2011-02-17 -1
14 2011-02-18 -1
15 2011-02-21 -1
16 2011-02-22 -1
17 2011-02-23 -1
18 2011-02-24 -1
19 2011-02-25 -1
20 2011-02-28 -1
21 2011-03-01 -1
22 2011-03-02 -1
23 2011-03-03 -1
24 2011-03-04 1
25 2011-03-07 1
26 2011-03-08 1
27 2011-03-09 1
28 2011-03-10 1
29 2011-03-11 1
30 2011-03-14 1
31 2011-03-15 -1
32 2011-03-16 -1
33 2011-03-17 -1
34 2011-03-18 -1
35 2011-03-21 -1
36 2011-03-22 -1
37 2011-03-23 -1
38 2011-03-24 -1
39 2011-03-25 -1
40 2011-03-28 -1
41 2011-03-29 1
42 2011-03-30 1
你可以这样做:
df['count'] = df.sign.ne(df.sign.shift(1)).cumsum()
Date sign count
0 2011-01-27 1 1
1 2011-01-28 -1 2
2 2011-01-31 -1 2
3 2011-02-01 -1 2
4 2011-02-02 -1 2
5 2011-02-07 -1 2
.
.
.
23 2011-03-03 -1 2
24 2011-03-04 1 3
25 2011-03-07 1 3
26 2011-03-08 1 3
27 2011-03-09 1 3
你可以得到符号变化的cumsum
,使用diff
获得:
df['new_column'] = (df.sign.diff()!=0).cumsum()-1
>>> df
Date sign new_column
0 2011-01-27 1 0
1 2011-01-28 -1 1
2 2011-01-31 -1 1
3 2011-02-01 -1 1
4 2011-02-02 -1 1
5 2011-02-07 -1 1
6 2011-02-08 -1 1
7 2011-02-09 -1 1
8 2011-02-10 -1 1
9 2011-02-11 -1 1
10 2011-02-14 -1 1
11 2011-02-15 -1 1
12 2011-02-16 -1 1
13 2011-02-17 -1 1
14 2011-02-18 -1 1
15 2011-02-21 -1 1
16 2011-02-22 -1 1
17 2011-02-23 -1 1
18 2011-02-24 -1 1
19 2011-02-25 -1 1
20 2011-02-28 -1 1
21 2011-03-01 -1 1
22 2011-03-02 -1 1
23 2011-03-03 -1 1
24 2011-03-04 1 2
25 2011-03-07 1 2
26 2011-03-08 1 2
27 2011-03-09 1 2
28 2011-03-10 1 2
29 2011-03-11 1 2
30 2011-03-14 1 2
31 2011-03-15 -1 3
32 2011-03-16 -1 3
33 2011-03-17 -1 3
34 2011-03-18 -1 3
35 2011-03-21 -1 3
36 2011-03-22 -1 3
37 2011-03-23 -1 3
38 2011-03-24 -1 3
39 2011-03-25 -1 3
40 2011-03-28 -1 3
41 2011-03-29 1 4
42 2011-03-30 1 4
我正在处理一个时间序列数据框,它看起来像这样,除了有超过数千行。我想创建一个新列来枚举具有相同值 'sign' 的行块。即第 0 行将是 0,第 1 行到第 23 行将是 1,第 24 行到第 30 行将是 2 等等......(时间顺序很重要)什么是最 pythonic 的方式来完成这个?提前谢谢你
Date sign
0 2011-01-27 1
1 2011-01-28 -1
2 2011-01-31 -1
3 2011-02-01 -1
4 2011-02-02 -1
5 2011-02-07 -1
6 2011-02-08 -1
7 2011-02-09 -1
8 2011-02-10 -1
9 2011-02-11 -1
10 2011-02-14 -1
11 2011-02-15 -1
12 2011-02-16 -1
13 2011-02-17 -1
14 2011-02-18 -1
15 2011-02-21 -1
16 2011-02-22 -1
17 2011-02-23 -1
18 2011-02-24 -1
19 2011-02-25 -1
20 2011-02-28 -1
21 2011-03-01 -1
22 2011-03-02 -1
23 2011-03-03 -1
24 2011-03-04 1
25 2011-03-07 1
26 2011-03-08 1
27 2011-03-09 1
28 2011-03-10 1
29 2011-03-11 1
30 2011-03-14 1
31 2011-03-15 -1
32 2011-03-16 -1
33 2011-03-17 -1
34 2011-03-18 -1
35 2011-03-21 -1
36 2011-03-22 -1
37 2011-03-23 -1
38 2011-03-24 -1
39 2011-03-25 -1
40 2011-03-28 -1
41 2011-03-29 1
42 2011-03-30 1
你可以这样做:
df['count'] = df.sign.ne(df.sign.shift(1)).cumsum()
Date sign count
0 2011-01-27 1 1
1 2011-01-28 -1 2
2 2011-01-31 -1 2
3 2011-02-01 -1 2
4 2011-02-02 -1 2
5 2011-02-07 -1 2
.
.
.
23 2011-03-03 -1 2
24 2011-03-04 1 3
25 2011-03-07 1 3
26 2011-03-08 1 3
27 2011-03-09 1 3
你可以得到符号变化的cumsum
,使用diff
获得:
df['new_column'] = (df.sign.diff()!=0).cumsum()-1
>>> df
Date sign new_column
0 2011-01-27 1 0
1 2011-01-28 -1 1
2 2011-01-31 -1 1
3 2011-02-01 -1 1
4 2011-02-02 -1 1
5 2011-02-07 -1 1
6 2011-02-08 -1 1
7 2011-02-09 -1 1
8 2011-02-10 -1 1
9 2011-02-11 -1 1
10 2011-02-14 -1 1
11 2011-02-15 -1 1
12 2011-02-16 -1 1
13 2011-02-17 -1 1
14 2011-02-18 -1 1
15 2011-02-21 -1 1
16 2011-02-22 -1 1
17 2011-02-23 -1 1
18 2011-02-24 -1 1
19 2011-02-25 -1 1
20 2011-02-28 -1 1
21 2011-03-01 -1 1
22 2011-03-02 -1 1
23 2011-03-03 -1 1
24 2011-03-04 1 2
25 2011-03-07 1 2
26 2011-03-08 1 2
27 2011-03-09 1 2
28 2011-03-10 1 2
29 2011-03-11 1 2
30 2011-03-14 1 2
31 2011-03-15 -1 3
32 2011-03-16 -1 3
33 2011-03-17 -1 3
34 2011-03-18 -1 3
35 2011-03-21 -1 3
36 2011-03-22 -1 3
37 2011-03-23 -1 3
38 2011-03-24 -1 3
39 2011-03-25 -1 3
40 2011-03-28 -1 3
41 2011-03-29 1 4
42 2011-03-30 1 4