pandas:根据特定列的值比较不相同的熊猫数据框列表
pandas: comparing non-identical list of panda dataframes based on values from a certain column
我有两个熊猫数据框列表如下,
import pandas as pd
import numpy as np
list_one = [pd.DataFrame({'sent_a.1': [0, 3, 2, 1], 'sent_a.2': [0, 1, 4, 0], 'sent_b.3': [0, 6, 0, 8],'sent_b.4': [1, 1, 8, 6],'ID':['id_1','id_1','id_1','id_1']}),
pd.DataFrame({'sent_a.1': [0, 3], 'sent_a.2': [0, 2], 'sent_b.3': [0, 6],'sent_b.4': [1, 1],'ID':['id_2','id_2']})]
list_two = [pd.DataFrame({'sent_a.1': [0, 5], 'sent_a.2': [0, 1], 'sent_b.3': [0, 6],'sent_b.4': [1, 1],'ID':['id_2','id_2']}),
pd.DataFrame({'sent_a.1': [0, 5, 3, 1], 'sent_a.2': [0, 2, 3, 1], 'sent_b.3': [0, 6, 6, 8],'sent_b.4': [1, 5, 8, 5],'ID':['id_1','id_1','id_1','id_1']})]
我想比较这两个列表中的数据帧,如果值相同,我想用 'True' 替换值,如果值不同,我想设置它们'False' 并将结果保存在不同的熊猫数据帧列表中。我做了以下,
for dfs in list_one:
for dfs2 in list_two:
g = np.where(dfs == dfs2, 'True', 'False')
print (g)
但我收到错误,
ValueError: Can only compare identically-labeled DataFrame objects
如何根据列 'ID' 中的值对这两个列表中的值进行排序?
编辑
我想比较列 'ID' 具有相同值的数据帧。这意味着 'ID' == 'id_1' 的数据帧将相互比较, 'ID' == 'id_2' 的数据帧将相互比较(不是交叉比较)
所以期望的输出是:
output = [ sent_a.1 sent_a.2 sent_b.3 sent_b.4 ID
0 True True True True id_1
1 False False True False id_1
2 False False False True id_1
3 False False True True id_1,
sent_a.1 sent_a.2 sent_b.3 sent_b.4 ID
0 True True True True id_2
1 True True False False id_2]
基于您当前的示例
第一个问题:
how can I sort values in these two lists, based on the values from column 'ID'?
list_one = sorted(list_one,key=lambda x: x['ID'].unique()[0][3:], reverse=False)
list_two =sorted(list_two,key=lambda x: x['ID'].unique()[0][3:], reverse=False)
ValueError:只能比较 identically-labeled 个 DataFrame 对象
- 数据帧中索引值顺序不同或数据帧形状不同导致的错误
第一种比较方式:
for dfs in list_one:
for dfs2 in list_two:
if dfs.shape == dfs2.shape:
g = np.where(dfs == dfs2, 'True', 'False')
print (g)
第二种方式:
I would like the dataframes that have the same value for column 'ID' to be compared
for dfs in list_one:
for dfs2 in list_two:
if (dfs['ID'].unique() == dfs2['ID'].unique()) and (dfs.shape == dfs2.shape):
g = np.where(dfs == dfs2, 'True', 'False')
print (g)
我有两个熊猫数据框列表如下,
import pandas as pd
import numpy as np
list_one = [pd.DataFrame({'sent_a.1': [0, 3, 2, 1], 'sent_a.2': [0, 1, 4, 0], 'sent_b.3': [0, 6, 0, 8],'sent_b.4': [1, 1, 8, 6],'ID':['id_1','id_1','id_1','id_1']}),
pd.DataFrame({'sent_a.1': [0, 3], 'sent_a.2': [0, 2], 'sent_b.3': [0, 6],'sent_b.4': [1, 1],'ID':['id_2','id_2']})]
list_two = [pd.DataFrame({'sent_a.1': [0, 5], 'sent_a.2': [0, 1], 'sent_b.3': [0, 6],'sent_b.4': [1, 1],'ID':['id_2','id_2']}),
pd.DataFrame({'sent_a.1': [0, 5, 3, 1], 'sent_a.2': [0, 2, 3, 1], 'sent_b.3': [0, 6, 6, 8],'sent_b.4': [1, 5, 8, 5],'ID':['id_1','id_1','id_1','id_1']})]
我想比较这两个列表中的数据帧,如果值相同,我想用 'True' 替换值,如果值不同,我想设置它们'False' 并将结果保存在不同的熊猫数据帧列表中。我做了以下,
for dfs in list_one:
for dfs2 in list_two:
g = np.where(dfs == dfs2, 'True', 'False')
print (g)
但我收到错误,
ValueError: Can only compare identically-labeled DataFrame objects
如何根据列 'ID' 中的值对这两个列表中的值进行排序?
编辑 我想比较列 'ID' 具有相同值的数据帧。这意味着 'ID' == 'id_1' 的数据帧将相互比较, 'ID' == 'id_2' 的数据帧将相互比较(不是交叉比较)
所以期望的输出是:
output = [ sent_a.1 sent_a.2 sent_b.3 sent_b.4 ID
0 True True True True id_1
1 False False True False id_1
2 False False False True id_1
3 False False True True id_1,
sent_a.1 sent_a.2 sent_b.3 sent_b.4 ID
0 True True True True id_2
1 True True False False id_2]
基于您当前的示例
第一个问题:
how can I sort values in these two lists, based on the values from column 'ID'?
list_one = sorted(list_one,key=lambda x: x['ID'].unique()[0][3:], reverse=False)
list_two =sorted(list_two,key=lambda x: x['ID'].unique()[0][3:], reverse=False)
ValueError:只能比较 identically-labeled 个 DataFrame 对象
- 数据帧中索引值顺序不同或数据帧形状不同导致的错误
第一种比较方式:
for dfs in list_one:
for dfs2 in list_two:
if dfs.shape == dfs2.shape:
g = np.where(dfs == dfs2, 'True', 'False')
print (g)
第二种方式:
I would like the dataframes that have the same value for column 'ID' to be compared
for dfs in list_one:
for dfs2 in list_two:
if (dfs['ID'].unique() == dfs2['ID'].unique()) and (dfs.shape == dfs2.shape):
g = np.where(dfs == dfs2, 'True', 'False')
print (g)