合并两个不同的数据框

Merging two different dataframes

我在根据列名合并两个不同的数据框时遇到问题。

代码:

import os, json, xlsxwriter
import pandas as pd

left = pd.DataFrame({'CompID': ['Computer-8', 'Computer-D', 'Computer-4', 'Computer-Z'], 'WindowsOsVersion': ['7', '11', 'XP', ''],'MacOsVersion': ['', '', '', 'Zebra']})
print ("left df:")
print (left)
right = pd.DataFrame({'OsName': ['XP', '7', '11', 'Zebra'], 'Upgrade': ['7', '8', 'none', 'Lion']})
print ("right df:")
print (right)

new_df = pd.merge(left, right,  how='inner', left_on=['WindowsOsVersion'], right_on = ['OsName'])
new_df2 = pd.merge(left, right,  how='inner', left_on=['MacOsVersion'], right_on = ['OsName'])
print ("WindowsOsVersion df:")
print (new_df)
print ("MacOsVersion df:")
print (new_df2)
tester = pd.merge(new_df, new_df2, on="CompID")
print ("Merge: ")
print (tester)
#print ("new df: ",left.merge(right, left_on=['WindowsOsVersion','MacOsVersion'], right_on='OsName'))

当前结果:

左侧 df:

CompId WindowsOsVersion MacOsVersion
Computer-8 7
Computer-D 11
Computer-4 XP
Computer-Z Zebra

右 df:

OsName Upgrade Cost
XP 7 £5
7 8 £10
11 none £0
Zebra Lion £10

我想要的结果:

CompId WindowsOsVersion MacOsVersion OsName Upgrade Cost
Computer-8 7 7 8 £10
Computer-D 11 11 none £0
Computer-4 XP XP 7 £5
Computer-Z Zebra Zebra Lion £10

如有任何帮助,我们将不胜感激

更新代码:

import os, json, xlsxwriter
import pandas as pd

left = pd.DataFrame({'CompID': ['Computer-8', 'Computer-D', 'Computer-4', 'Computer-Z'], 'WindowsOsVersion': ['7', '11', 'XP', ''],'MacOsVersion': ['', '', '', 'Zebra']})
print ("left df:")
print (left)
right = pd.DataFrame({'OsName': ['XP', '7', '11', 'Zebra'], 'Upgrade': ['7', '8', 'none', 'Lion']})
print ("right df:")
print (right)

new_df = pd.merge(left, right,  how='left', left_on=['WindowsOsVersion'], right_on = ['OsName'])
new_df2 = pd.merge(left, right,  how='left', left_on=['MacOsVersion'], right_on = ['OsName'])
print ("WindowsOsVersion df:")
print (new_df)
print ("MacOsVersion df:")
print (new_df2)
tester = pd.merge(new_df, new_df2, on="CompID", how='outer',suffixes=('', '_y'))
for col in tester:
        if col.endswith('_x'):
            tester.rename(columns = lambda col:col.rstrip('_x'),inplace=True)
        elif col.endswith('_y'):
            to_drop = [col for col in tester if col.endswith('_y')]
            tester.drop(to_drop,axis=1,inplace=True)
        else:
            pass
print ("Merge: ")
print (tester)
#print ("new df: ",left.merge(right, left_on=['WindowsOsVersion','MacOsVersion'], right_on='OsName'))

当前 table:

CompId WindowsOsVersion MacOsVersion OsName Upgrade
Computer-8 7 7 8
Computer-D 11 11 none
Computer-4 XP XP 7
Computer-Z Zebra NaN NaN

我不确定为什么最后一列显示的信息不正确?

应该是斑马,狮子

简单来说,您可以执行以下操作:

首先创建合并的数据框。

new_df = pd.merge(left, right,  how='left', left_on=['WindowsOsVersion'], right_on = ['OsName'])
new_df = pd.merge(new_df, right,  how='left', left_on=['MacOsVersion'], right_on = ['OsName'])

此时的数据框如下所示:

       CompID WindowsOsVersion MacOsVersion OsName_x Upgrade_x OsName_y Upgrade_y
0  Computer-8                7                     7         8      NaN       NaN
1  Computer-D               11                    11      none      NaN       NaN
2  Computer-4               XP                    XP         7      NaN       NaN
3  Computer-Z                         Zebra      NaN       NaN    Zebra      Lion

现在您可以使用fillna() to combine the column data. This can also be achieved with combine_first()

new_df['OsName_x'].fillna(new_df['OsName_y'], inplace = True)
new_df['Upgrade_x'].fillna(new_df['Upgrade_y'], inplace = True)

生成的数据框现在看起来像这样:

       CompID WindowsOsVersion MacOsVersion OsName_x Upgrade_x OsName_y Upgrade_y
0  Computer-8                7                     7         8      NaN       NaN
1  Computer-D               11                    11      none      NaN       NaN
2  Computer-4               XP                    XP         7      NaN       NaN
3  Computer-Z                         Zebra    Zebra      Lion    Zebra      Lion

您现在可以像在现有代码中那样删除和重命名列。

您的代码未产生预期结果的原因有两个。在创建 'tester' 数据框时,指定的后缀是 '' 和 '_y' 而不是 '_x' 和 '_y'。然后后续代码尝试重命名后缀为“_x”的列(没有列!),并删除后缀为“_y”的列(最后 4 列!)。在重命名和删除操作之前,数据帧 'tester' 如下所示:

       CompID WindowsOsVersion MacOsVersion OsName Upgrade WindowsOsVersion_y MacOsVersion_y OsName_y Upgrade_y
0  Computer-8                7                   7       8                  7                     NaN       NaN
1  Computer-D               11                  11    none                 11                     NaN       NaN
2  Computer-4               XP                  XP       7                 XP                     NaN       NaN
3  Computer-Z                         Zebra    NaN     NaN                             Zebra    Zebra      Lion