使用数据框中的值在另一个数据框中查找
Use the value from a dataframe to lookup in another dataframe
我正在尝试使用 df 列 (df1) 中的值作为在另一个 df (df2) 中查找的索引。
我使用 apply 和 lambda 函数找到了解决方案:
max_edad = int(df2.iloc[:,0].max() - 1) #The value will be 116
df1['Vivos(t)'] = df1['fecha_ord'].apply(lambda x: df2.loc[int(x), 'lx_1970'] * (1 - (x % 1)) + df2.loc[int(x) + 1,'lx_1970'] * (x % 1) if x < max_edad else 0)
但是我运行在一个巨大的数据库中使用它,它太慢了(尽管它可以工作)。
你知道我怎样才能 运行 以不同的方式更快地获得它吗?
以下是我的数据帧的一些示例:
df1
t
fecha
Factor_desc
fecha_ord
0
2016-04-01
1.000000
45.325120
1
2016-05-01
0.996339
45.407255
2
2016-06-01
0.992691
45.492129
3
2016-07-01
0.989056
45.574264
4
2016-08-01
0.985435
45.659138
5
2016-09-01
0.981827
45.744011
6
2016-10-01
0.978232
45.826146
7
2016-11-01
0.974650
45.911020
8
2016-12-01
0.971082
45.993155
9
2017-01-01
0.967526
46.078029
10
2017-02-01
0.963984
46.162902
11
2017-03-01
0.960454
46.239562
12
2017-04-01
0.956938
46.324435
13
2017-05-01
0.953434
46.406571
14
2017-06-01
0.949943
46.491444
...
...
...
...
1390
2132-02-01
0.057234
161.158111
1391
2132-03-01
0.057163
161.237509
1392
2132-04-01
0.057093
161.322382
df2
艾达
lx_1970
0.0
1.000000
1.0
9.909948
2.0
9.901297
3.0
9.896776
4.0
9.892829
5.0
9.889542
6.0
9.886405
...
...
41.0
9.577991
42.0
9.565103
43.0
9.551536
44.0
9.537515
45.0
9.522749
46.0
9.507039
...
...
116.0
0
我希望得到以下 df:
df3
t
fecha
Factor_desc
fecha_ord
Vivos(t)
0
2016-04-01
1.000000
45.325120
9.517642
1
2016-05-01
0.996339
45.407255
9.516351
2
2016-06-01
0.992691
45.492129
9.515018
3
2016-07-01
0.989056
45.574264
9.513728
4
2016-08-01
0.985435
45.659138
9.512394
5
2016-09-01
0.981827
45.744011
9.511061
6
2016-10-01
0.978232
45.826146
9.509770
7
2016-11-01
0.974650
45.911020
9.508437
8
2016-12-01
0.971082
45.993155
9.507147
9
2017-01-01
0.967526
46.078029
9.505715
10
2017-02-01
0.963984
46.162902
9.504274
11
2017-03-01
0.960454
46.239562
9.502972
12
2017-04-01
0.956938
46.324435
9.501532
13
2017-05-01
0.953434
46.406571
9.500137
14
2017-06-01
0.949943
46.491444
9.498696
...
...
...
...
...
1390
2132-02-01
0.057234
161.158111
0.0
1391
2132-03-01
0.057163
161.237509
0.0
1392
2132-04-01
0.057093
161.322382
0.0
非常感谢!
我觉得把计算分成几个步骤比较好:
df1['fecha_ord_int'] = df1['fecha_ord'].astype(int)
df1['fecha_ord_dec'] = df1['fecha_ord'] % 1
df2['lx_1970_next'] = df2['lx_1970'].shift(-1)
df1 = df1.merge(df2, how='inner', left_on='fecha_ord_int', right_on='edad')
# now do the calculation you want
# you can drop the columns you don't want later
希望对您有所帮助
我正在尝试使用 df 列 (df1) 中的值作为在另一个 df (df2) 中查找的索引。
我使用 apply 和 lambda 函数找到了解决方案:
max_edad = int(df2.iloc[:,0].max() - 1) #The value will be 116
df1['Vivos(t)'] = df1['fecha_ord'].apply(lambda x: df2.loc[int(x), 'lx_1970'] * (1 - (x % 1)) + df2.loc[int(x) + 1,'lx_1970'] * (x % 1) if x < max_edad else 0)
但是我运行在一个巨大的数据库中使用它,它太慢了(尽管它可以工作)。
你知道我怎样才能 运行 以不同的方式更快地获得它吗?
以下是我的数据帧的一些示例:
df1
t | fecha | Factor_desc | fecha_ord |
---|---|---|---|
0 | 2016-04-01 | 1.000000 | 45.325120 |
1 | 2016-05-01 | 0.996339 | 45.407255 |
2 | 2016-06-01 | 0.992691 | 45.492129 |
3 | 2016-07-01 | 0.989056 | 45.574264 |
4 | 2016-08-01 | 0.985435 | 45.659138 |
5 | 2016-09-01 | 0.981827 | 45.744011 |
6 | 2016-10-01 | 0.978232 | 45.826146 |
7 | 2016-11-01 | 0.974650 | 45.911020 |
8 | 2016-12-01 | 0.971082 | 45.993155 |
9 | 2017-01-01 | 0.967526 | 46.078029 |
10 | 2017-02-01 | 0.963984 | 46.162902 |
11 | 2017-03-01 | 0.960454 | 46.239562 |
12 | 2017-04-01 | 0.956938 | 46.324435 |
13 | 2017-05-01 | 0.953434 | 46.406571 |
14 | 2017-06-01 | 0.949943 | 46.491444 |
... | ... | ... | ... |
1390 | 2132-02-01 | 0.057234 | 161.158111 |
1391 | 2132-03-01 | 0.057163 | 161.237509 |
1392 | 2132-04-01 | 0.057093 | 161.322382 |
df2
艾达 | lx_1970 |
---|---|
0.0 | 1.000000 |
1.0 | 9.909948 |
2.0 | 9.901297 |
3.0 | 9.896776 |
4.0 | 9.892829 |
5.0 | 9.889542 |
6.0 | 9.886405 |
... | ... |
41.0 | 9.577991 |
42.0 | 9.565103 |
43.0 | 9.551536 |
44.0 | 9.537515 |
45.0 | 9.522749 |
46.0 | 9.507039 |
... | ... |
116.0 | 0 |
我希望得到以下 df:
df3
t | fecha | Factor_desc | fecha_ord | Vivos(t) |
---|---|---|---|---|
0 | 2016-04-01 | 1.000000 | 45.325120 | 9.517642 |
1 | 2016-05-01 | 0.996339 | 45.407255 | 9.516351 |
2 | 2016-06-01 | 0.992691 | 45.492129 | 9.515018 |
3 | 2016-07-01 | 0.989056 | 45.574264 | 9.513728 |
4 | 2016-08-01 | 0.985435 | 45.659138 | 9.512394 |
5 | 2016-09-01 | 0.981827 | 45.744011 | 9.511061 |
6 | 2016-10-01 | 0.978232 | 45.826146 | 9.509770 |
7 | 2016-11-01 | 0.974650 | 45.911020 | 9.508437 |
8 | 2016-12-01 | 0.971082 | 45.993155 | 9.507147 |
9 | 2017-01-01 | 0.967526 | 46.078029 | 9.505715 |
10 | 2017-02-01 | 0.963984 | 46.162902 | 9.504274 |
11 | 2017-03-01 | 0.960454 | 46.239562 | 9.502972 |
12 | 2017-04-01 | 0.956938 | 46.324435 | 9.501532 |
13 | 2017-05-01 | 0.953434 | 46.406571 | 9.500137 |
14 | 2017-06-01 | 0.949943 | 46.491444 | 9.498696 |
... | ... | ... | ... | ... |
1390 | 2132-02-01 | 0.057234 | 161.158111 | 0.0 |
1391 | 2132-03-01 | 0.057163 | 161.237509 | 0.0 |
1392 | 2132-04-01 | 0.057093 | 161.322382 | 0.0 |
非常感谢!
我觉得把计算分成几个步骤比较好:
df1['fecha_ord_int'] = df1['fecha_ord'].astype(int)
df1['fecha_ord_dec'] = df1['fecha_ord'] % 1
df2['lx_1970_next'] = df2['lx_1970'].shift(-1)
df1 = df1.merge(df2, how='inner', left_on='fecha_ord_int', right_on='edad')
# now do the calculation you want
# you can drop the columns you don't want later
希望对您有所帮助