Pandas 向量化方式来标记一系列 m*n 元素中第一个出现的值(m 个元素)
Pandas vectorized way to tag first occurring value(m elements) in a series of m*n elements
I have a pandas series of m*n elements of the following form where m=5 and n=3 :
A: [1 1 1 1 1 0 1 1 0 0 0 0 0 1 1]
I need a result series as follows :
B: [1 0 0 0 0 0 1 0 0 0 0 0 0 1 0]
m and n can be any values.
I also have supplemental data that might help.
At least one supplemental data is as follows :
HORIZON: [0 1 2 3 4 0 1 2 3 4 0 1 2 3 4]
The original series of 0 and 1 values can be derived from real data which is :
CUSIP: [CUSIP1 CUSIP1 CUSIP1 CUSIP1 CUSIP1 np.nan CUSIP2 CUSIP2 ... CUSIP3]
到目前为止我想到了什么:
将系列 A 向右移动并与 A 异或。但是这个想法似乎没有任何领先优势,因为有一些边缘情况它无论如何也解决不了。
使用标准的 for 循环非常简单,但我们已经转向向量化操作,所以我真的更喜欢向量化的方式来做到这一点。
谢谢。
EDIT:
The solution proposed works and the result is :
A: [1 1 1 1 1 0 1 1 0 0 0 0 0 1 1]
A': [nan 1 1 1 1 1 0 1 1 0 0 0 0 0 1] (A shifted)
A.where(A.ne(A.shift()) & A.eq(1),0)
B : [1 0 0 0 0 0 1 0 0 0 0 0 0 1 0]
FURTHER EDIT:
There is an edge case for which the solution doesnt work. Modified solution is :
a = pandas.Series([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
x = a.shift()
x.iloc[::5] = numpy.nan
b = a.where(a.ne(x) & a.eq(1),0)
您可以 shift
和 eq
a.where(a.ne(a.shift()) & a.eq(1),0)
Out[32]:
0 1
1 0
2 0
3 0
4 0
5 0
6 1
7 0
8 0
9 0
10 0
11 0
12 0
13 1
14 0
dtype: int64
I have a pandas series of m*n elements of the following form where m=5 and n=3 :
A: [1 1 1 1 1 0 1 1 0 0 0 0 0 1 1]
I need a result series as follows :
B: [1 0 0 0 0 0 1 0 0 0 0 0 0 1 0]
m and n can be any values.
I also have supplemental data that might help.
At least one supplemental data is as follows :
HORIZON: [0 1 2 3 4 0 1 2 3 4 0 1 2 3 4]
The original series of 0 and 1 values can be derived from real data which is :
CUSIP: [CUSIP1 CUSIP1 CUSIP1 CUSIP1 CUSIP1 np.nan CUSIP2 CUSIP2 ... CUSIP3]
到目前为止我想到了什么: 将系列 A 向右移动并与 A 异或。但是这个想法似乎没有任何领先优势,因为有一些边缘情况它无论如何也解决不了。
使用标准的 for 循环非常简单,但我们已经转向向量化操作,所以我真的更喜欢向量化的方式来做到这一点。
谢谢。
EDIT:
The solution proposed works and the result is :
A: [1 1 1 1 1 0 1 1 0 0 0 0 0 1 1]
A': [nan 1 1 1 1 1 0 1 1 0 0 0 0 0 1] (A shifted)
A.where(A.ne(A.shift()) & A.eq(1),0)
B : [1 0 0 0 0 0 1 0 0 0 0 0 0 1 0]
FURTHER EDIT:
There is an edge case for which the solution doesnt work. Modified solution is :
a = pandas.Series([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
x = a.shift()
x.iloc[::5] = numpy.nan
b = a.where(a.ne(x) & a.eq(1),0)
您可以 shift
和 eq
a.where(a.ne(a.shift()) & a.eq(1),0)
Out[32]:
0 1
1 0
2 0
3 0
4 0
5 0
6 1
7 0
8 0
9 0
10 0
11 0
12 0
13 1
14 0
dtype: int64