Pandas:在加入数据帧之前舍入整数
Pandas: Round integers before joining dataframes
我有两个数据框都包含坐标。其中之一 df1
的坐标具有更好的分辨率(带小数),我想将其加入分辨率较差的 df2
:
import pandas as pd
df1 = pd.DataFrame({'x': [1.1, 2.2, 3.3],
'y': [2.3, 3.3, 4.1],
'val': [10,11,12]})
df2 = pd.DataFrame({'x': [1,2,3,5.5],
'y': [2,3,4,5.6]})
df1['x_org']=df1['x']
df1['y_org']=df1['y']
df1[['x','y']] = df1[['x','y']].round()
df1 = pd.merge(df1, df2, how='left', on=['x','y'])
df1.drop({'x','y'}, axis=1)
# rename...
上面的代码正是我想要的,但是有点麻烦。有没有更简单的方法来实现这个?
IIUC,您可以将舍入值作为连接键传递:
pd.merge(df1.rename(columns={'x': 'x_org', 'y': 'y_org'}),
df2,
how='left',
left_on=[df1['x'].round(), df1['x'].round()],
right_on=['x', 'y'])#.drop({'x','y'}, axis=1) # if x/y are unwanted
输出:
x_org y_org val x y
0 1.1 2.3 10 1.0 1.0
1 2.2 3.3 11 2.0 2.0
2 3.3 4.1 12 3.0 3.0
使用:
df1.merge(df2,
how='left',
left_on=[df1['x'].round(), df1['y'].round()],
right_on=['x','y'],
suffixes=('','_')).drop(['x_','y_'], axis=1)
也可以删除以 _
结尾的列 dynamic:
df = df1.merge(df2,
how='left',
left_on=[df1['x'].round(), df1['y'].round()],
right_on=['x','y'],
suffixes=('','_')).filter(regex='.*[^_]$')
print (df)
x y val
0 1.1 2.3 10
1 2.2 3.3 11
2 3.3 4.1 12
df = df1.merge(df2,
how='left',
left_on=[df1['x'].round(), df1['y'].round()],
right_on=['x','y'],
suffixes=('','_end')).filter(regex='.*(?<!_end)$')
print (df)
x y val
0 1.1 2.3 10
1 2.2 3.3 11
2 3.3 4.1 12
或者:
df = (df1.set_index(['x','y'], drop=False).rename(lambda x: round(x))
.merge(df2.set_index(['x','y']),
left_index=True,
right_index=True,
how='left').reset_index(drop=True))
print (df)
x y val
0 1.1 2.3 10
1 2.2 3.3 11
2 3.3 4.1 12
我有两个数据框都包含坐标。其中之一 df1
的坐标具有更好的分辨率(带小数),我想将其加入分辨率较差的 df2
:
import pandas as pd
df1 = pd.DataFrame({'x': [1.1, 2.2, 3.3],
'y': [2.3, 3.3, 4.1],
'val': [10,11,12]})
df2 = pd.DataFrame({'x': [1,2,3,5.5],
'y': [2,3,4,5.6]})
df1['x_org']=df1['x']
df1['y_org']=df1['y']
df1[['x','y']] = df1[['x','y']].round()
df1 = pd.merge(df1, df2, how='left', on=['x','y'])
df1.drop({'x','y'}, axis=1)
# rename...
上面的代码正是我想要的,但是有点麻烦。有没有更简单的方法来实现这个?
IIUC,您可以将舍入值作为连接键传递:
pd.merge(df1.rename(columns={'x': 'x_org', 'y': 'y_org'}),
df2,
how='left',
left_on=[df1['x'].round(), df1['x'].round()],
right_on=['x', 'y'])#.drop({'x','y'}, axis=1) # if x/y are unwanted
输出:
x_org y_org val x y
0 1.1 2.3 10 1.0 1.0
1 2.2 3.3 11 2.0 2.0
2 3.3 4.1 12 3.0 3.0
使用:
df1.merge(df2,
how='left',
left_on=[df1['x'].round(), df1['y'].round()],
right_on=['x','y'],
suffixes=('','_')).drop(['x_','y_'], axis=1)
也可以删除以 _
结尾的列 dynamic:
df = df1.merge(df2,
how='left',
left_on=[df1['x'].round(), df1['y'].round()],
right_on=['x','y'],
suffixes=('','_')).filter(regex='.*[^_]$')
print (df)
x y val
0 1.1 2.3 10
1 2.2 3.3 11
2 3.3 4.1 12
df = df1.merge(df2,
how='left',
left_on=[df1['x'].round(), df1['y'].round()],
right_on=['x','y'],
suffixes=('','_end')).filter(regex='.*(?<!_end)$')
print (df)
x y val
0 1.1 2.3 10
1 2.2 3.3 11
2 3.3 4.1 12
或者:
df = (df1.set_index(['x','y'], drop=False).rename(lambda x: round(x))
.merge(df2.set_index(['x','y']),
left_index=True,
right_index=True,
how='left').reset_index(drop=True))
print (df)
x y val
0 1.1 2.3 10
1 2.2 3.3 11
2 3.3 4.1 12