比较两个数据框,只显示不匹配的列记录
Comparing two dataframes and only showing the unmatched column records
我有一个字典列表转换为 Pandas Dataframe
我能够打印不匹配的记录,但我不想要整个记录,而只想要记录中不匹配的列。
{'A':5,'B':6,'C': 7}]
列表 2 是:[{'A':5,'B':8,'C': 7}]
,。我只想获得不匹配的 B 的输出。假设词典列表将有多个词典。我有两个数据框,正在比较以查找不匹配的记录。
我需要了解如何做到这一点
尝试过的可能解决方案:-
正在查找公共记录并从数据框中删除,但我得到的是整行。
但是我只需要具有不匹配值的列。
请注意大约有 50 列
对于 df1
Date Fruit Num Color
0 2013-11-24 Banana 22.1 Yellow
1 2013-11-24 Orange 8.6 Orange
2 2013-11-24 Apple 7.6 Green
3 2013-11-24 Celery 10.2 Green
对于 df2
Date Fruit Num Color
0 2013-11-24 Banana 22.1 Orange
1 2013-11-24 Orange 8.6 Orange
2 2013-11-24 Apple 7.6 Green
3 2013-11-24 Celery 10.2 Green
对于df_diff
Color
1 Orange
因此,给定以下词典:
dict_1 = {
"Date": {0: "2013-11-24", 1: "2013-11-24", 2: "2013-11-24", 3: "2013-11-24"},
"Fruit": {0: "Banana", 1: "Orange", 2: "Apple", 3: "Celery"},
"Num": {0: 22.1, 1: 8.6, 2: 7.6, 3: 10.2},
"Color": {0: "Yellow", 1: "Orange", 2: "Green", 3: "Green"},
}
dict_2 = {
"Date": {0: "2013-11-03", 1: "2013-11-24", 2: "2013-11-24", 3: "2013-11-24"},
"Fruit": {0: "Banana", 1: "Orange", 2: "Citrus", 3: "Celery"},
"Num": {0: 22.1, 1: 2.2, 2: 7.6, 3: 0.2},
"Color": {0: "Orange", 1: "Orange", 2: "Green", 3: "Green"},
}
您可以找到这样的差异:
diff_dict = {}
for outer_key, inner_dict in dict_1.items():
diff_dict[outer_key] = {}
for inner_key, inner_value in inner_dict.items():
if (other_value := dict_2[outer_key][inner_key]) != inner_value:
diff_dict[outer_key][inner_key] = other_value
else:
diff_dict[outer_key][inner_key] = "-"
然后用 Pandas 可视化它们:
import pandas as pd
print(pd.DataFrame(diff_dict))
# Output
Date Fruit Num Color
0 2013-11-03 - - Orange
1 - - 2.2 -
2 - Citrus - -
3 - - 0.2 -
df1 = pd.DataFrame({"Fruit":['Banana','Orange','Apple','Celery'],
'Num':[22.1,8.6,7.6,10.2], 'Color':['Yellow','Orange','Green','Green']})
df2 = pd.DataFrame({"Fruit":['Banana','Orange','Mango','Celery'],
'Num':[22.1,8.6,7.6,15], 'Color':['Orage','Orange','Green','Green']})
您可以使用布尔索引获取不匹配项
df_diff = df1[df1!=df2].fillna('')
如果您想要两个不匹配的列和不匹配的值
{col:i for col in df_diff.columns for i in df_diff[col] if len(str(i)) > 0}
如果您只想要不匹配的列
[col for col in df_diff.columns for i in df_diff[col] if len(str(i))>0]
我有一个字典列表转换为 Pandas Dataframe 我能够打印不匹配的记录,但我不想要整个记录,而只想要记录中不匹配的列。
{'A':5,'B':6,'C': 7}]
列表 2 是:[{'A':5,'B':8,'C': 7}]
,。我只想获得不匹配的 B 的输出。假设词典列表将有多个词典。我有两个数据框,正在比较以查找不匹配的记录。
我需要了解如何做到这一点
尝试过的可能解决方案:-
正在查找公共记录并从数据框中删除,但我得到的是整行。
但是我只需要具有不匹配值的列。
请注意大约有 50 列
对于 df1
Date Fruit Num Color
0 2013-11-24 Banana 22.1 Yellow
1 2013-11-24 Orange 8.6 Orange
2 2013-11-24 Apple 7.6 Green
3 2013-11-24 Celery 10.2 Green
对于 df2
Date Fruit Num Color
0 2013-11-24 Banana 22.1 Orange
1 2013-11-24 Orange 8.6 Orange
2 2013-11-24 Apple 7.6 Green
3 2013-11-24 Celery 10.2 Green
对于df_diff
Color
1 Orange
因此,给定以下词典:
dict_1 = {
"Date": {0: "2013-11-24", 1: "2013-11-24", 2: "2013-11-24", 3: "2013-11-24"},
"Fruit": {0: "Banana", 1: "Orange", 2: "Apple", 3: "Celery"},
"Num": {0: 22.1, 1: 8.6, 2: 7.6, 3: 10.2},
"Color": {0: "Yellow", 1: "Orange", 2: "Green", 3: "Green"},
}
dict_2 = {
"Date": {0: "2013-11-03", 1: "2013-11-24", 2: "2013-11-24", 3: "2013-11-24"},
"Fruit": {0: "Banana", 1: "Orange", 2: "Citrus", 3: "Celery"},
"Num": {0: 22.1, 1: 2.2, 2: 7.6, 3: 0.2},
"Color": {0: "Orange", 1: "Orange", 2: "Green", 3: "Green"},
}
您可以找到这样的差异:
diff_dict = {}
for outer_key, inner_dict in dict_1.items():
diff_dict[outer_key] = {}
for inner_key, inner_value in inner_dict.items():
if (other_value := dict_2[outer_key][inner_key]) != inner_value:
diff_dict[outer_key][inner_key] = other_value
else:
diff_dict[outer_key][inner_key] = "-"
然后用 Pandas 可视化它们:
import pandas as pd
print(pd.DataFrame(diff_dict))
# Output
Date Fruit Num Color
0 2013-11-03 - - Orange
1 - - 2.2 -
2 - Citrus - -
3 - - 0.2 -
df1 = pd.DataFrame({"Fruit":['Banana','Orange','Apple','Celery'],
'Num':[22.1,8.6,7.6,10.2], 'Color':['Yellow','Orange','Green','Green']})
df2 = pd.DataFrame({"Fruit":['Banana','Orange','Mango','Celery'],
'Num':[22.1,8.6,7.6,15], 'Color':['Orage','Orange','Green','Green']})
您可以使用布尔索引获取不匹配项
df_diff = df1[df1!=df2].fillna('')
如果您想要两个不匹配的列和不匹配的值
{col:i for col in df_diff.columns for i in df_diff[col] if len(str(i)) > 0}
如果您只想要不匹配的列
[col for col in df_diff.columns for i in df_diff[col] if len(str(i))>0]