pandas:根据特定列的值比较不相同的熊猫数据框列表

pandas: comparing non-identical list of panda dataframes based on values from a certain column

我有两个熊猫数据框列表如下,

import pandas as pd
import numpy as np
list_one = [pd.DataFrame({'sent_a.1': [0, 3, 2, 1], 'sent_a.2': [0, 1, 4, 0], 'sent_b.3': [0, 6, 0, 8],'sent_b.4': [1, 1, 8, 6],'ID':['id_1','id_1','id_1','id_1']}),
        pd.DataFrame({'sent_a.1': [0, 3], 'sent_a.2': [0, 2], 'sent_b.3': [0, 6],'sent_b.4': [1, 1],'ID':['id_2','id_2']})]

list_two = [pd.DataFrame({'sent_a.1': [0, 5], 'sent_a.2': [0, 1], 'sent_b.3': [0, 6],'sent_b.4': [1, 1],'ID':['id_2','id_2']}),
            pd.DataFrame({'sent_a.1': [0, 5, 3, 1], 'sent_a.2': [0, 2, 3, 1], 'sent_b.3': [0, 6, 6, 8],'sent_b.4': [1, 5, 8, 5],'ID':['id_1','id_1','id_1','id_1']})]

我想比较这两个列表中的数据帧,如果值相同,我想用 'True' 替换值,如果值不同,我想设置它们'False' 并将结果保存在不同的熊猫数据帧列表中。我做了以下,

for dfs in list_one:
    for dfs2 in list_two:
       g = np.where(dfs == dfs2, 'True', 'False')
       print (g)

但我收到错误,

ValueError: Can only compare identically-labeled DataFrame objects

如何根据列 'ID' 中的值对这两个列表中的值进行排序?

编辑 我想比较列 'ID' 具有相同值的数据帧。这意味着 'ID' == 'id_1' 的数据帧将相互比较, 'ID' == 'id_2' 的数据帧将相互比较(不是交叉比较)

所以期望的输出是:

output = [   sent_a.1  sent_a.2  sent_b.3  sent_b.4    ID
        0    True         True      True       True   id_1

        1    False        False     True       False  id_1

        2    False        False     False      True   id_1

        3    False        False     True       True   id_1, 
             sent_a.1  sent_a.2  sent_b.3  sent_b.4    ID
        0    True         True      True       True   id_2
        1    True         True      False      False  id_2]

基于您当前的示例

第一个问题:

how can I sort values in these two lists, based on the values from column 'ID'?

list_one = sorted(list_one,key=lambda x: x['ID'].unique()[0][3:], reverse=False)
list_two =sorted(list_two,key=lambda x: x['ID'].unique()[0][3:], reverse=False)

ValueError:只能比较 identically-labeled 个 DataFrame 对象

  • 数据帧中索引值顺序不同或数据帧形状不同导致的错误

第一种比较方式:

for dfs in list_one:
    for dfs2 in list_two:
        if dfs.shape == dfs2.shape:
            g = np.where(dfs == dfs2, 'True', 'False')
            print (g)

第二种方式:

I would like the dataframes that have the same value for column 'ID' to be compared

for dfs in list_one:
    for dfs2 in list_two:
        if (dfs['ID'].unique() == dfs2['ID'].unique()) and (dfs.shape == dfs2.shape):
            g = np.where(dfs == dfs2, 'True', 'False')
            print (g)