pandas 来自产品的 multIindex - 忽略同一行比较
pandas multIndex from product - ignore same row comparison
我有一个 pandas 数据框,如下所示
Company,year
T123 Inc Ltd,1990
T124 PVT ltd,1991
ABC Limited,1992
ABCDE Ltd,1994
tf = pd.read_clipboard(sep=',')
tf['Company_copy'] = tf['Company']
我想将 tf['company']
的每个值与 tf['company_copy
] 的每个值进行比较,但排除相同的匹配行号或索引号,字符串
例如:我希望 T123 Inc Ltd
与剩余的 3 项进行比较。同样,我希望 ABCDE Ltd
仅与剩余 3 项进行比较。
所以,我在这里
的帮助下尝试了下面的方法
compare = pd.MultiIndex.from_product([tf['Company'].astype(str),tf['Company_copy'].astype(str)]).to_series()
但它会产生一些不正确的比较,如下所示。我想避免重复比较
.
我希望我的输出如下所示。您可以看到它没有 duplicate/same 行比较
Company Company_copy
T123 Inc Ltd T124 PVT ltd ( T123 Inc Ltd, T124 PVT ltd)
ABC Limited ( T123 Inc Ltd, ABC Limited)
ABCDE Ltd ( T123 Inc Ltd, ABCDE Ltd)
T124 PVT ltd T123 Inc Ltd ( T124 PVT ltd, T123 Inc Ltd)
ABC Limited ( T124 PVT ltd, ABC Limited)
ABCDE Ltd ( T124 PVT ltd, ABCDE Ltd)
ABC Limited T123 Inc Ltd ( ABC Limited, T123 Inc Ltd)
T124 PVT ltd ( ABC Limited, T124 PVT ltd)
ABCDE Ltd ( ABC Limited, ABCDE Ltd)
ABCDE Ltd T123 Inc Ltd ( ABCDE Ltd, T123 Inc Ltd)
T124 PVT ltd ( ABCDE Ltd, T124 PVT ltd)
ABC Limited ( ABCDE Ltd, ABC Limited)
您可以比较MultiIndex
的两个级别是否不相等,比较第一级和第二级:
compare = pd.MultiIndex.from_product([tf['Company'].astype(str),tf['Company_copy'].astype(str)]).to_series()
compare = compare[compare.index.get_level_values(0) != compare.index.get_level_values(1)]
print (compare)
Company Company_copy
T123 Inc Ltd T124 PVT ltd (T123 Inc Ltd, T124 PVT ltd)
ABC Limited (T123 Inc Ltd, ABC Limited)
ABCDE Ltd (T123 Inc Ltd, ABCDE Ltd)
T124 PVT ltd T123 Inc Ltd (T124 PVT ltd, T123 Inc Ltd)
ABC Limited (T124 PVT ltd, ABC Limited)
ABCDE Ltd (T124 PVT ltd, ABCDE Ltd)
ABC Limited T123 Inc Ltd (ABC Limited, T123 Inc Ltd)
T124 PVT ltd (ABC Limited, T124 PVT ltd)
ABCDE Ltd (ABC Limited, ABCDE Ltd)
ABCDE Ltd T123 Inc Ltd (ABCDE Ltd, T123 Inc Ltd)
T124 PVT ltd (ABCDE Ltd, T124 PVT ltd)
ABC Limited (ABCDE Ltd, ABC Limited)
dtype: object
我有一个 pandas 数据框,如下所示
Company,year
T123 Inc Ltd,1990
T124 PVT ltd,1991
ABC Limited,1992
ABCDE Ltd,1994
tf = pd.read_clipboard(sep=',')
tf['Company_copy'] = tf['Company']
我想将 tf['company']
的每个值与 tf['company_copy
] 的每个值进行比较,但排除相同的匹配行号或索引号,字符串
例如:我希望 T123 Inc Ltd
与剩余的 3 项进行比较。同样,我希望 ABCDE Ltd
仅与剩余 3 项进行比较。
所以,我在
compare = pd.MultiIndex.from_product([tf['Company'].astype(str),tf['Company_copy'].astype(str)]).to_series()
但它会产生一些不正确的比较,如下所示。我想避免重复比较
我希望我的输出如下所示。您可以看到它没有 duplicate/same 行比较
Company Company_copy
T123 Inc Ltd T124 PVT ltd ( T123 Inc Ltd, T124 PVT ltd)
ABC Limited ( T123 Inc Ltd, ABC Limited)
ABCDE Ltd ( T123 Inc Ltd, ABCDE Ltd)
T124 PVT ltd T123 Inc Ltd ( T124 PVT ltd, T123 Inc Ltd)
ABC Limited ( T124 PVT ltd, ABC Limited)
ABCDE Ltd ( T124 PVT ltd, ABCDE Ltd)
ABC Limited T123 Inc Ltd ( ABC Limited, T123 Inc Ltd)
T124 PVT ltd ( ABC Limited, T124 PVT ltd)
ABCDE Ltd ( ABC Limited, ABCDE Ltd)
ABCDE Ltd T123 Inc Ltd ( ABCDE Ltd, T123 Inc Ltd)
T124 PVT ltd ( ABCDE Ltd, T124 PVT ltd)
ABC Limited ( ABCDE Ltd, ABC Limited)
您可以比较MultiIndex
的两个级别是否不相等,比较第一级和第二级:
compare = pd.MultiIndex.from_product([tf['Company'].astype(str),tf['Company_copy'].astype(str)]).to_series()
compare = compare[compare.index.get_level_values(0) != compare.index.get_level_values(1)]
print (compare)
Company Company_copy
T123 Inc Ltd T124 PVT ltd (T123 Inc Ltd, T124 PVT ltd)
ABC Limited (T123 Inc Ltd, ABC Limited)
ABCDE Ltd (T123 Inc Ltd, ABCDE Ltd)
T124 PVT ltd T123 Inc Ltd (T124 PVT ltd, T123 Inc Ltd)
ABC Limited (T124 PVT ltd, ABC Limited)
ABCDE Ltd (T124 PVT ltd, ABCDE Ltd)
ABC Limited T123 Inc Ltd (ABC Limited, T123 Inc Ltd)
T124 PVT ltd (ABC Limited, T124 PVT ltd)
ABCDE Ltd (ABC Limited, ABCDE Ltd)
ABCDE Ltd T123 Inc Ltd (ABCDE Ltd, T123 Inc Ltd)
T124 PVT ltd (ABCDE Ltd, T124 PVT ltd)
ABC Limited (ABCDE Ltd, ABC Limited)
dtype: object