比较两个数据框,只显示不匹配的列记录

Comparing two dataframes and only showing the unmatched column records

我有一个字典列表转换为 Pandas Dataframe 我能够打印不匹配的记录,但我不想要整个记录,而只想要记录中不匹配的列。

{'A':5,'B':6,'C': 7}] 列表 2 是:[{'A':5,'B':8,'C': 7}],。我只想获得不匹配的 B 的输出。假设词典列表将有多个词典。我有两个数据框,正在比较以查找不匹配的记录。

我需要了解如何做到这一点

尝试过的可能解决方案:-

正在查找公共记录并从数据框中删除,但我得到的是整行。

但是我只需要具有不匹配值的列。

请注意大约有 50 列

对于 df1

         Date   Fruit   Num   Color
0  2013-11-24  Banana  22.1  Yellow
1  2013-11-24  Orange   8.6  Orange
2  2013-11-24   Apple   7.6   Green
3  2013-11-24  Celery  10.2   Green

对于 df2

         Date   Fruit   Num   Color
0  2013-11-24  Banana  22.1  Orange
1  2013-11-24  Orange   8.6  Orange
2  2013-11-24   Apple   7.6   Green
3  2013-11-24  Celery  10.2   Green

对于df_diff

Color
1 Orange

因此,给定以下词典:

dict_1 = {
    "Date": {0: "2013-11-24", 1: "2013-11-24", 2: "2013-11-24", 3: "2013-11-24"},
    "Fruit": {0: "Banana", 1: "Orange", 2: "Apple", 3: "Celery"},
    "Num": {0: 22.1, 1: 8.6, 2: 7.6, 3: 10.2},
    "Color": {0: "Yellow", 1: "Orange", 2: "Green", 3: "Green"},
}

dict_2 = {
    "Date": {0: "2013-11-03", 1: "2013-11-24", 2: "2013-11-24", 3: "2013-11-24"},
    "Fruit": {0: "Banana", 1: "Orange", 2: "Citrus", 3: "Celery"},
    "Num": {0: 22.1, 1: 2.2, 2: 7.6, 3: 0.2},
    "Color": {0: "Orange", 1: "Orange", 2: "Green", 3: "Green"},
}

您可以找到这样的差异:

diff_dict = {}
for outer_key, inner_dict in dict_1.items():
    diff_dict[outer_key] = {}
    for inner_key, inner_value in inner_dict.items():
        if (other_value := dict_2[outer_key][inner_key]) != inner_value:
            diff_dict[outer_key][inner_key] = other_value
        else:
            diff_dict[outer_key][inner_key] = "-"

然后用 Pandas 可视化它们:

import pandas as pd

print(pd.DataFrame(diff_dict))
# Output

         Date   Fruit  Num   Color
0  2013-11-03       -    -  Orange
1           -       -  2.2       -
2           -  Citrus    -       -
3           -       -  0.2       -
df1 = pd.DataFrame({"Fruit":['Banana','Orange','Apple','Celery'],
                  'Num':[22.1,8.6,7.6,10.2], 'Color':['Yellow','Orange','Green','Green']})

df2 = pd.DataFrame({"Fruit":['Banana','Orange','Mango','Celery'],
                  'Num':[22.1,8.6,7.6,15], 'Color':['Orage','Orange','Green','Green']})

您可以使用布尔索引获取不匹配项

df_diff = df1[df1!=df2].fillna('')

如果您想要两个不匹配的列和不匹配的值

{col:i for col in df_diff.columns for i in df_diff[col] if len(str(i)) > 0}

如果您只想要不匹配的列

[col for col in df_diff.columns for i in df_diff[col] if len(str(i))>0]