使用数据框中的值在另一个数据框中查找

Question

我正在尝试使用 df 列 (df1) 中的值作为在另一个 df (df2) 中查找的索引。

我使用 apply 和 lambda 函数找到了解决方案：

max_edad = int(df2.iloc[:,0].max() - 1) #The value will be 116
df1['Vivos(t)'] = df1['fecha_ord'].apply(lambda x: df2.loc[int(x), 'lx_1970'] * (1 - (x % 1)) + df2.loc[int(x) + 1,'lx_1970'] * (x % 1) if x < max_edad else 0)

但是我运行在一个巨大的数据库中使用它，它太慢了（尽管它可以工作）。

你知道我怎样才能运行以不同的方式更快地获得它吗？

以下是我的数据帧的一些示例：

df1

t	fecha	Factor_desc	fecha_ord
0	2016-04-01	1.000000	45.325120
1	2016-05-01	0.996339	45.407255
2	2016-06-01	0.992691	45.492129
3	2016-07-01	0.989056	45.574264
4	2016-08-01	0.985435	45.659138
5	2016-09-01	0.981827	45.744011
6	2016-10-01	0.978232	45.826146
7	2016-11-01	0.974650	45.911020
8	2016-12-01	0.971082	45.993155
9	2017-01-01	0.967526	46.078029
10	2017-02-01	0.963984	46.162902
11	2017-03-01	0.960454	46.239562
12	2017-04-01	0.956938	46.324435
13	2017-05-01	0.953434	46.406571
14	2017-06-01	0.949943	46.491444
...	...	...	...
1390	2132-02-01	0.057234	161.158111
1391	2132-03-01	0.057163	161.237509
1392	2132-04-01	0.057093	161.322382

df2

艾达	lx_1970
0.0	1.000000
1.0	9.909948
2.0	9.901297
3.0	9.896776
4.0	9.892829
5.0	9.889542
6.0	9.886405
...	...
41.0	9.577991
42.0	9.565103
43.0	9.551536
44.0	9.537515
45.0	9.522749
46.0	9.507039
...	...
116.0	0

我希望得到以下 df：

df3

t	fecha	Factor_desc	fecha_ord	Vivos(t)
0	2016-04-01	1.000000	45.325120	9.517642
1	2016-05-01	0.996339	45.407255	9.516351
2	2016-06-01	0.992691	45.492129	9.515018
3	2016-07-01	0.989056	45.574264	9.513728
4	2016-08-01	0.985435	45.659138	9.512394
5	2016-09-01	0.981827	45.744011	9.511061
6	2016-10-01	0.978232	45.826146	9.509770
7	2016-11-01	0.974650	45.911020	9.508437
8	2016-12-01	0.971082	45.993155	9.507147
9	2017-01-01	0.967526	46.078029	9.505715
10	2017-02-01	0.963984	46.162902	9.504274
11	2017-03-01	0.960454	46.239562	9.502972
12	2017-04-01	0.956938	46.324435	9.501532
13	2017-05-01	0.953434	46.406571	9.500137
14	2017-06-01	0.949943	46.491444	9.498696
...	...	...	...	...
1390	2132-02-01	0.057234	161.158111	0.0
1391	2132-03-01	0.057163	161.237509	0.0
1392	2132-04-01	0.057093	161.322382	0.0

非常感谢！

Answer 1

我觉得把计算分成几个步骤比较好：

df1['fecha_ord_int'] = df1['fecha_ord'].astype(int)
df1['fecha_ord_dec'] = df1['fecha_ord'] % 1
df2['lx_1970_next'] = df2['lx_1970'].shift(-1)

df1 = df1.merge(df2, how='inner', left_on='fecha_ord_int', right_on='edad')

# now do the calculation you want
# you can drop the columns you don't want later

希望对您有所帮助

使用数据框中的值在另一个数据框中查找

Use the value from a dataframe to lookup in another dataframe

python

dataframe

python-3.x

pandas

numpy-ndarray