如何使用 class 层次结构的列表比较 class 被磨碎的两列

How to compare two columns which class is grater using a list of class hierarchy

我有一个 classes 的列表,从大到小:

classes = ['A','B','C','D']

还有一个包含两列的数据框:

 Segmentation 2019 Segmentation  2020
         B              A
         B              A 
         A              B         
         C              C         
         B              D

如何在比较后使用 class 值创建第三列,其中 class 更大(如果相等 - 保留它)?

您可以从 classes 列表创建字典,其中键是 class,值是索引(用作排名,因为列表是从大到小)

然后您可以创建 2 个包含排名的排名列(0 到 N -- 0 越大)。最后,比较排名,取排名高的(即值小的)

classes = ['A','B','C','D']
classes_dict = {val: index for index,val in enumerate(classes)}
df['Seg 2019 Rank'] = df['Seg 2019'].map(classes_dict)
df['Seg 2020 Rank'] = df['Seg 2020'].map(classes_dict)
df['greater'] = df.apply(lambda x: x['Seg 2019'] if x['Seg 2019 Rank'] < x['Seg 2020 Rank'] else x['Seg 2020'] if x['Seg 2020 Rank'] < x['Seg 2019 Rank'] else "equal" , axis=1)

输出:

Seg 2019    Seg 2020    Seg 2019 Rank   Seg 2020 Rank   greater
    B   A   1   0   A
    B   A   1   0   A
    A   B   0   1   A
    C   C   2   2   equal
    B   D   1   3   B

如果您添加一个新的 class(VIP),您只需将其添加到列表中的 A 之前,它将被视为更大的 class

ordered categoricals and numpy.where 的解决方案,用于获取 Equal 或两列之间的最小值:

print (df)

  Segmentation 2019 Segmentation 2020
0                 B               VIP
1                 B                 A
2                 A                 B
3                 C                 C
4                 B                 D

classes = ['VIP','A','B','C','D']

df['Segmentation 2020'] = pd.Categorical(df['Segmentation 2020'], 
                                         ordered=True,
                                         categories=classes)
df['Segmentation 2019'] = pd.Categorical(df['Segmentation 2019'], 
                                         ordered=True, 
                                         categories=classes)

mask = df['Segmentation 2019'].eq(df['Segmentation 2020'])
s = df[['Segmentation 2019','Segmentation 2020']].stack().min(level=0)
df['new'] = np.where(mask, 'Equal', s)
print (df)
  Segmentation 2019 Segmentation 2020    new
0                 B               VIP    VIP
1                 B                 A      A
2                 A                 B      A
3                 C                 C  Equal
4                 B                 D      B

numpy.select的解决方案:

classes = ['VIP','A','B','C','D']

df['Segmentation 2020'] = pd.Categorical(df['Segmentation 2020'], 
                                         ordered=True,
                                         categories=classes)
df['Segmentation 2019'] = pd.Categorical(df['Segmentation 2019'], 
                                         ordered=True, 
                                         categories=classes)

mask1 = df['Segmentation 2019'].lt(df['Segmentation 2020'])
mask2 = df['Segmentation 2019'].gt(df['Segmentation 2020'])

df['classes'] = np.select([mask1, mask2], 
                          [df['Segmentation 2019'], df['Segmentation 2020']], 
                          default='Equal')
print (df)
  Segmentation 2019 Segmentation 2020 classes
0                 B               VIP     VIP
1                 B                 A       A
2                 A                 B       A
3                 C                 C   Equal
4                 B                 D       B