在 pandas 数据框中用重复的数字序列标记相似类别
Tagging a Similar category with repeated sequence of numbers in pandas dataframe
下面是可重现的代码
colo = ['red', 'red', 'red','cross','cross','red', 'red', 'red','cross','cross','cross',
'cross','cross', 'red', 'red','cross', 'red','cross','cross']
dt = pd.DataFrame()
dt['seq']=[i for i in range(len(colo))]
dt['col'] = colo
预期输出:
已提供列 seq
和 col
,需要创建 Expected_col
。
这是一种使用 eq
+ diff
+ ne
+ cumsum
来扩大群体的方法;然后使用布尔索引来填充值:
cond = dt['col'].eq('red')
s = dt.loc[cond, 'seq'].diff().ne(1).cumsum()
dt['Expected_col'] = dt['col']
dt.loc[cond, 'Expected_col'] = 'RED' + (s.max() + 1 - s).astype(str)
输出:
seq col Expected_col
0 0 red RED4
1 1 red RED4
2 2 red RED4
3 3 cross cross
4 4 cross cross
5 5 red RED3
6 6 red RED3
7 7 red RED3
8 8 cross cross
9 9 cross cross
10 10 cross cross
11 11 cross cross
12 12 cross cross
13 13 red RED2
14 14 red RED2
15 15 cross cross
16 16 red RED1
17 17 cross cross
18 18 cross cross
下面是可重现的代码
colo = ['red', 'red', 'red','cross','cross','red', 'red', 'red','cross','cross','cross',
'cross','cross', 'red', 'red','cross', 'red','cross','cross']
dt = pd.DataFrame()
dt['seq']=[i for i in range(len(colo))]
dt['col'] = colo
预期输出:
已提供列 seq
和 col
,需要创建 Expected_col
。
这是一种使用 eq
+ diff
+ ne
+ cumsum
来扩大群体的方法;然后使用布尔索引来填充值:
cond = dt['col'].eq('red')
s = dt.loc[cond, 'seq'].diff().ne(1).cumsum()
dt['Expected_col'] = dt['col']
dt.loc[cond, 'Expected_col'] = 'RED' + (s.max() + 1 - s).astype(str)
输出:
seq col Expected_col
0 0 red RED4
1 1 red RED4
2 2 red RED4
3 3 cross cross
4 4 cross cross
5 5 red RED3
6 6 red RED3
7 7 red RED3
8 8 cross cross
9 9 cross cross
10 10 cross cross
11 11 cross cross
12 12 cross cross
13 13 red RED2
14 14 red RED2
15 15 cross cross
16 16 red RED1
17 17 cross cross
18 18 cross cross