如何使用 class 层次结构的列表比较 class 被磨碎的两列
How to compare two columns which class is grater using a list of class hierarchy
我有一个 classes 的列表,从大到小:
classes = ['A','B','C','D']
还有一个包含两列的数据框:
Segmentation 2019 Segmentation 2020
B A
B A
A B
C C
B D
如何在比较后使用 class 值创建第三列,其中 class 更大(如果相等 - 保留它)?
您可以从 classes 列表创建字典,其中键是 class,值是索引(用作排名,因为列表是从大到小)
然后您可以创建 2 个包含排名的排名列(0 到 N -- 0 越大)。最后,比较排名,取排名高的(即值小的)
classes = ['A','B','C','D']
classes_dict = {val: index for index,val in enumerate(classes)}
df['Seg 2019 Rank'] = df['Seg 2019'].map(classes_dict)
df['Seg 2020 Rank'] = df['Seg 2020'].map(classes_dict)
df['greater'] = df.apply(lambda x: x['Seg 2019'] if x['Seg 2019 Rank'] < x['Seg 2020 Rank'] else x['Seg 2020'] if x['Seg 2020 Rank'] < x['Seg 2019 Rank'] else "equal" , axis=1)
输出:
Seg 2019 Seg 2020 Seg 2019 Rank Seg 2020 Rank greater
B A 1 0 A
B A 1 0 A
A B 0 1 A
C C 2 2 equal
B D 1 3 B
如果您添加一个新的 class(VIP),您只需将其添加到列表中的 A 之前,它将被视为更大的 class
ordered categoricals and numpy.where
的解决方案,用于获取 Equal
或两列之间的最小值:
print (df)
Segmentation 2019 Segmentation 2020
0 B VIP
1 B A
2 A B
3 C C
4 B D
classes = ['VIP','A','B','C','D']
df['Segmentation 2020'] = pd.Categorical(df['Segmentation 2020'],
ordered=True,
categories=classes)
df['Segmentation 2019'] = pd.Categorical(df['Segmentation 2019'],
ordered=True,
categories=classes)
mask = df['Segmentation 2019'].eq(df['Segmentation 2020'])
s = df[['Segmentation 2019','Segmentation 2020']].stack().min(level=0)
df['new'] = np.where(mask, 'Equal', s)
print (df)
Segmentation 2019 Segmentation 2020 new
0 B VIP VIP
1 B A A
2 A B A
3 C C Equal
4 B D B
或numpy.select
的解决方案:
classes = ['VIP','A','B','C','D']
df['Segmentation 2020'] = pd.Categorical(df['Segmentation 2020'],
ordered=True,
categories=classes)
df['Segmentation 2019'] = pd.Categorical(df['Segmentation 2019'],
ordered=True,
categories=classes)
mask1 = df['Segmentation 2019'].lt(df['Segmentation 2020'])
mask2 = df['Segmentation 2019'].gt(df['Segmentation 2020'])
df['classes'] = np.select([mask1, mask2],
[df['Segmentation 2019'], df['Segmentation 2020']],
default='Equal')
print (df)
Segmentation 2019 Segmentation 2020 classes
0 B VIP VIP
1 B A A
2 A B A
3 C C Equal
4 B D B
我有一个 classes 的列表,从大到小:
classes = ['A','B','C','D']
还有一个包含两列的数据框:
Segmentation 2019 Segmentation 2020
B A
B A
A B
C C
B D
如何在比较后使用 class 值创建第三列,其中 class 更大(如果相等 - 保留它)?
您可以从 classes 列表创建字典,其中键是 class,值是索引(用作排名,因为列表是从大到小)
然后您可以创建 2 个包含排名的排名列(0 到 N -- 0 越大)。最后,比较排名,取排名高的(即值小的)
classes = ['A','B','C','D']
classes_dict = {val: index for index,val in enumerate(classes)}
df['Seg 2019 Rank'] = df['Seg 2019'].map(classes_dict)
df['Seg 2020 Rank'] = df['Seg 2020'].map(classes_dict)
df['greater'] = df.apply(lambda x: x['Seg 2019'] if x['Seg 2019 Rank'] < x['Seg 2020 Rank'] else x['Seg 2020'] if x['Seg 2020 Rank'] < x['Seg 2019 Rank'] else "equal" , axis=1)
输出:
Seg 2019 Seg 2020 Seg 2019 Rank Seg 2020 Rank greater
B A 1 0 A
B A 1 0 A
A B 0 1 A
C C 2 2 equal
B D 1 3 B
如果您添加一个新的 class(VIP),您只需将其添加到列表中的 A 之前,它将被视为更大的 class
ordered categoricals and numpy.where
的解决方案,用于获取 Equal
或两列之间的最小值:
print (df)
Segmentation 2019 Segmentation 2020
0 B VIP
1 B A
2 A B
3 C C
4 B D
classes = ['VIP','A','B','C','D']
df['Segmentation 2020'] = pd.Categorical(df['Segmentation 2020'],
ordered=True,
categories=classes)
df['Segmentation 2019'] = pd.Categorical(df['Segmentation 2019'],
ordered=True,
categories=classes)
mask = df['Segmentation 2019'].eq(df['Segmentation 2020'])
s = df[['Segmentation 2019','Segmentation 2020']].stack().min(level=0)
df['new'] = np.where(mask, 'Equal', s)
print (df)
Segmentation 2019 Segmentation 2020 new
0 B VIP VIP
1 B A A
2 A B A
3 C C Equal
4 B D B
或numpy.select
的解决方案:
classes = ['VIP','A','B','C','D']
df['Segmentation 2020'] = pd.Categorical(df['Segmentation 2020'],
ordered=True,
categories=classes)
df['Segmentation 2019'] = pd.Categorical(df['Segmentation 2019'],
ordered=True,
categories=classes)
mask1 = df['Segmentation 2019'].lt(df['Segmentation 2020'])
mask2 = df['Segmentation 2019'].gt(df['Segmentation 2020'])
df['classes'] = np.select([mask1, mask2],
[df['Segmentation 2019'], df['Segmentation 2020']],
default='Equal')
print (df)
Segmentation 2019 Segmentation 2020 classes
0 B VIP VIP
1 B A A
2 A B A
3 C C Equal
4 B D B