合并两个不同的数据框
Merging two different dataframes
我在根据列名合并两个不同的数据框时遇到问题。
代码:
import os, json, xlsxwriter
import pandas as pd
left = pd.DataFrame({'CompID': ['Computer-8', 'Computer-D', 'Computer-4', 'Computer-Z'], 'WindowsOsVersion': ['7', '11', 'XP', ''],'MacOsVersion': ['', '', '', 'Zebra']})
print ("left df:")
print (left)
right = pd.DataFrame({'OsName': ['XP', '7', '11', 'Zebra'], 'Upgrade': ['7', '8', 'none', 'Lion']})
print ("right df:")
print (right)
new_df = pd.merge(left, right, how='inner', left_on=['WindowsOsVersion'], right_on = ['OsName'])
new_df2 = pd.merge(left, right, how='inner', left_on=['MacOsVersion'], right_on = ['OsName'])
print ("WindowsOsVersion df:")
print (new_df)
print ("MacOsVersion df:")
print (new_df2)
tester = pd.merge(new_df, new_df2, on="CompID")
print ("Merge: ")
print (tester)
#print ("new df: ",left.merge(right, left_on=['WindowsOsVersion','MacOsVersion'], right_on='OsName'))
当前结果:
左侧 df:
CompId
WindowsOsVersion
MacOsVersion
Computer-8
7
Computer-D
11
Computer-4
XP
Computer-Z
Zebra
右 df:
OsName
Upgrade
Cost
XP
7
£5
7
8
£10
11
none
£0
Zebra
Lion
£10
我想要的结果:
CompId
WindowsOsVersion
MacOsVersion
OsName
Upgrade
Cost
Computer-8
7
7
8
£10
Computer-D
11
11
none
£0
Computer-4
XP
XP
7
£5
Computer-Z
Zebra
Zebra
Lion
£10
如有任何帮助,我们将不胜感激
更新代码:
import os, json, xlsxwriter
import pandas as pd
left = pd.DataFrame({'CompID': ['Computer-8', 'Computer-D', 'Computer-4', 'Computer-Z'], 'WindowsOsVersion': ['7', '11', 'XP', ''],'MacOsVersion': ['', '', '', 'Zebra']})
print ("left df:")
print (left)
right = pd.DataFrame({'OsName': ['XP', '7', '11', 'Zebra'], 'Upgrade': ['7', '8', 'none', 'Lion']})
print ("right df:")
print (right)
new_df = pd.merge(left, right, how='left', left_on=['WindowsOsVersion'], right_on = ['OsName'])
new_df2 = pd.merge(left, right, how='left', left_on=['MacOsVersion'], right_on = ['OsName'])
print ("WindowsOsVersion df:")
print (new_df)
print ("MacOsVersion df:")
print (new_df2)
tester = pd.merge(new_df, new_df2, on="CompID", how='outer',suffixes=('', '_y'))
for col in tester:
if col.endswith('_x'):
tester.rename(columns = lambda col:col.rstrip('_x'),inplace=True)
elif col.endswith('_y'):
to_drop = [col for col in tester if col.endswith('_y')]
tester.drop(to_drop,axis=1,inplace=True)
else:
pass
print ("Merge: ")
print (tester)
#print ("new df: ",left.merge(right, left_on=['WindowsOsVersion','MacOsVersion'], right_on='OsName'))
当前 table:
CompId
WindowsOsVersion
MacOsVersion
OsName
Upgrade
Computer-8
7
7
8
Computer-D
11
11
none
Computer-4
XP
XP
7
Computer-Z
Zebra
NaN
NaN
我不确定为什么最后一列显示的信息不正确?
应该是斑马,狮子
简单来说,您可以执行以下操作:
首先创建合并的数据框。
new_df = pd.merge(left, right, how='left', left_on=['WindowsOsVersion'], right_on = ['OsName'])
new_df = pd.merge(new_df, right, how='left', left_on=['MacOsVersion'], right_on = ['OsName'])
此时的数据框如下所示:
CompID WindowsOsVersion MacOsVersion OsName_x Upgrade_x OsName_y Upgrade_y
0 Computer-8 7 7 8 NaN NaN
1 Computer-D 11 11 none NaN NaN
2 Computer-4 XP XP 7 NaN NaN
3 Computer-Z Zebra NaN NaN Zebra Lion
现在您可以使用fillna() to combine the column data. This can also be achieved with combine_first()
new_df['OsName_x'].fillna(new_df['OsName_y'], inplace = True)
new_df['Upgrade_x'].fillna(new_df['Upgrade_y'], inplace = True)
生成的数据框现在看起来像这样:
CompID WindowsOsVersion MacOsVersion OsName_x Upgrade_x OsName_y Upgrade_y
0 Computer-8 7 7 8 NaN NaN
1 Computer-D 11 11 none NaN NaN
2 Computer-4 XP XP 7 NaN NaN
3 Computer-Z Zebra Zebra Lion Zebra Lion
您现在可以像在现有代码中那样删除和重命名列。
您的代码未产生预期结果的原因有两个。在创建 'tester' 数据框时,指定的后缀是 '' 和 '_y' 而不是 '_x' 和 '_y'。然后后续代码尝试重命名后缀为“_x”的列(没有列!),并删除后缀为“_y”的列(最后 4 列!)。在重命名和删除操作之前,数据帧 'tester' 如下所示:
CompID WindowsOsVersion MacOsVersion OsName Upgrade WindowsOsVersion_y MacOsVersion_y OsName_y Upgrade_y
0 Computer-8 7 7 8 7 NaN NaN
1 Computer-D 11 11 none 11 NaN NaN
2 Computer-4 XP XP 7 XP NaN NaN
3 Computer-Z Zebra NaN NaN Zebra Zebra Lion
我在根据列名合并两个不同的数据框时遇到问题。
代码:
import os, json, xlsxwriter
import pandas as pd
left = pd.DataFrame({'CompID': ['Computer-8', 'Computer-D', 'Computer-4', 'Computer-Z'], 'WindowsOsVersion': ['7', '11', 'XP', ''],'MacOsVersion': ['', '', '', 'Zebra']})
print ("left df:")
print (left)
right = pd.DataFrame({'OsName': ['XP', '7', '11', 'Zebra'], 'Upgrade': ['7', '8', 'none', 'Lion']})
print ("right df:")
print (right)
new_df = pd.merge(left, right, how='inner', left_on=['WindowsOsVersion'], right_on = ['OsName'])
new_df2 = pd.merge(left, right, how='inner', left_on=['MacOsVersion'], right_on = ['OsName'])
print ("WindowsOsVersion df:")
print (new_df)
print ("MacOsVersion df:")
print (new_df2)
tester = pd.merge(new_df, new_df2, on="CompID")
print ("Merge: ")
print (tester)
#print ("new df: ",left.merge(right, left_on=['WindowsOsVersion','MacOsVersion'], right_on='OsName'))
当前结果:
左侧 df:
CompId | WindowsOsVersion | MacOsVersion |
---|---|---|
Computer-8 | 7 | |
Computer-D | 11 | |
Computer-4 | XP | |
Computer-Z | Zebra |
右 df:
OsName | Upgrade | Cost |
---|---|---|
XP | 7 | £5 |
7 | 8 | £10 |
11 | none | £0 |
Zebra | Lion | £10 |
我想要的结果:
CompId | WindowsOsVersion | MacOsVersion | OsName | Upgrade | Cost |
---|---|---|---|---|---|
Computer-8 | 7 | 7 | 8 | £10 | |
Computer-D | 11 | 11 | none | £0 | |
Computer-4 | XP | XP | 7 | £5 | |
Computer-Z | Zebra | Zebra | Lion | £10 |
如有任何帮助,我们将不胜感激
更新代码:
import os, json, xlsxwriter
import pandas as pd
left = pd.DataFrame({'CompID': ['Computer-8', 'Computer-D', 'Computer-4', 'Computer-Z'], 'WindowsOsVersion': ['7', '11', 'XP', ''],'MacOsVersion': ['', '', '', 'Zebra']})
print ("left df:")
print (left)
right = pd.DataFrame({'OsName': ['XP', '7', '11', 'Zebra'], 'Upgrade': ['7', '8', 'none', 'Lion']})
print ("right df:")
print (right)
new_df = pd.merge(left, right, how='left', left_on=['WindowsOsVersion'], right_on = ['OsName'])
new_df2 = pd.merge(left, right, how='left', left_on=['MacOsVersion'], right_on = ['OsName'])
print ("WindowsOsVersion df:")
print (new_df)
print ("MacOsVersion df:")
print (new_df2)
tester = pd.merge(new_df, new_df2, on="CompID", how='outer',suffixes=('', '_y'))
for col in tester:
if col.endswith('_x'):
tester.rename(columns = lambda col:col.rstrip('_x'),inplace=True)
elif col.endswith('_y'):
to_drop = [col for col in tester if col.endswith('_y')]
tester.drop(to_drop,axis=1,inplace=True)
else:
pass
print ("Merge: ")
print (tester)
#print ("new df: ",left.merge(right, left_on=['WindowsOsVersion','MacOsVersion'], right_on='OsName'))
当前 table:
CompId | WindowsOsVersion | MacOsVersion | OsName | Upgrade |
---|---|---|---|---|
Computer-8 | 7 | 7 | 8 | |
Computer-D | 11 | 11 | none | |
Computer-4 | XP | XP | 7 | |
Computer-Z | Zebra | NaN | NaN |
我不确定为什么最后一列显示的信息不正确?
应该是斑马,狮子
简单来说,您可以执行以下操作:
首先创建合并的数据框。
new_df = pd.merge(left, right, how='left', left_on=['WindowsOsVersion'], right_on = ['OsName'])
new_df = pd.merge(new_df, right, how='left', left_on=['MacOsVersion'], right_on = ['OsName'])
此时的数据框如下所示:
CompID WindowsOsVersion MacOsVersion OsName_x Upgrade_x OsName_y Upgrade_y
0 Computer-8 7 7 8 NaN NaN
1 Computer-D 11 11 none NaN NaN
2 Computer-4 XP XP 7 NaN NaN
3 Computer-Z Zebra NaN NaN Zebra Lion
现在您可以使用fillna() to combine the column data. This can also be achieved with combine_first()
new_df['OsName_x'].fillna(new_df['OsName_y'], inplace = True)
new_df['Upgrade_x'].fillna(new_df['Upgrade_y'], inplace = True)
生成的数据框现在看起来像这样:
CompID WindowsOsVersion MacOsVersion OsName_x Upgrade_x OsName_y Upgrade_y
0 Computer-8 7 7 8 NaN NaN
1 Computer-D 11 11 none NaN NaN
2 Computer-4 XP XP 7 NaN NaN
3 Computer-Z Zebra Zebra Lion Zebra Lion
您现在可以像在现有代码中那样删除和重命名列。
您的代码未产生预期结果的原因有两个。在创建 'tester' 数据框时,指定的后缀是 '' 和 '_y' 而不是 '_x' 和 '_y'。然后后续代码尝试重命名后缀为“_x”的列(没有列!),并删除后缀为“_y”的列(最后 4 列!)。在重命名和删除操作之前,数据帧 'tester' 如下所示:
CompID WindowsOsVersion MacOsVersion OsName Upgrade WindowsOsVersion_y MacOsVersion_y OsName_y Upgrade_y
0 Computer-8 7 7 8 7 NaN NaN
1 Computer-D 11 11 none 11 NaN NaN
2 Computer-4 XP XP 7 XP NaN NaN
3 Computer-Z Zebra NaN NaN Zebra Zebra Lion