在pandas中以矢量化的方式计算特定连续相等值的个数
Calculating the number of specific consecutive equal values in a vectorized way in pandas
假设我们有以下 pandas DataFrame:
In [1]:
import pandas as pd
import numpy as np
df = pd.DataFrame([0, 1, 0, 0, 1, 1, 0, 1, 1, 1], columns=['in'])
df
Out[1]:
in
0 0
1 1
2 0
3 0
4 1
5 1
6 0
7 1
8 1
9 1
如何计算pandas中向量化的连续个数?我想要这样的结果:
in out
0 0 0
1 1 1
2 0 0
3 0 0
4 1 1
5 1 2
6 0 0
7 1 1
8 1 2
9 1 3
类似于在特定条件下重置的向量化 cumsum 运算。
你可以这样做(致谢:how to emulate itertools.groupby with a series/dataframe?):
>>> df['in'].groupby((df['in'] != df['in'].shift()).cumsum()).cumsum()
0 0
1 1
2 0
3 0
4 1
5 2
6 0
7 1
8 2
9 3
dtype: int64
假设我们有以下 pandas DataFrame:
In [1]:
import pandas as pd
import numpy as np
df = pd.DataFrame([0, 1, 0, 0, 1, 1, 0, 1, 1, 1], columns=['in'])
df
Out[1]:
in
0 0
1 1
2 0
3 0
4 1
5 1
6 0
7 1
8 1
9 1
如何计算pandas中向量化的连续个数?我想要这样的结果:
in out
0 0 0
1 1 1
2 0 0
3 0 0
4 1 1
5 1 2
6 0 0
7 1 1
8 1 2
9 1 3
类似于在特定条件下重置的向量化 cumsum 运算。
你可以这样做(致谢:how to emulate itertools.groupby with a series/dataframe?):
>>> df['in'].groupby((df['in'] != df['in'].shift()).cumsum()).cumsum()
0 0
1 1
2 0
3 0
4 1
5 2
6 0
7 1
8 2
9 3
dtype: int64