更新两个系列/列中的类别以进行比较

Update categories in two Series / Columns for comparison

如果我尝试比较两个不同类别的系列,我会收到错误消息:

a = pd.Categorical([1, 2, 3])
b = pd.Categorical([4, 5, 3])
df = pd.DataFrame([a, b], columns=['a', 'b'])

   a  b
0  1  4
1  2  5
2  3  3

df.a == df.b

# TypeError: Categoricals can only be compared if 'categories' are the same.

更新两个系列中的类别的最佳方法是什么?谢谢!

我的解决方案:

df['b'] = df.b.cat.add_categories(df.a.cat.categories.difference(df.b.cat.categories))
df['a'] = df.a.cat.add_categories(df.b.cat.categories.difference(df.a.cat.categories))
df.a == df.b

输出:

0    False
1    False
2     True
dtype: bool

一个想法 union_categoricals:

from pandas.api.types import union_categoricals

union = union_categoricals([df.a, df.b]).categories

df['a'] = df.a.cat.set_categories(union)
df['b'] = df.b.cat.set_categories(union)
print (df.a == df.b)
0    False
1    False
2     True
dtype: bool