For循环遍历两个数据帧以从一个df中提取项目并将其存储在另一个

For loop over two data frames to extract item from one df and store it on other

我有两个数据框dfdf_station

df

经纬度
51.2161 -122.3111
52.0780 -122.1795

df_station

站经纬度
牛熊证 -128.1567 52.1850
CWAE -122.9547 50.1290
CWCL -121.5047 51.1447
CWEB -126.5431 49.3833

我必须在 df[STATION] 中创建一个新列,其中包含 station。此站点值是根据最小距离选择的。

我的代码

station = []
for i, r in df.iterrows():
    lon1 = r['LONGITUDE']
    lat1 = r['LATITUDE']
    dist = []

    for i, v in df_station.iterrows():
        lon2 = v['lon']
        lat2 = v['lat']
        dist.append(haversine((lat1, lon1 ), ( lat2 , lon2), unit='km')
                    
    station.append(df_station['station'][dist.index(min(dist))])
       
# store station name
df['STATION'] = station

错误 -- station.append(df_station['station'][dist.index(min(dist))]) 语法错误:语法无效

不确定如何在第一个 for 循环中使用 df_station。

想要的结果是数据框 df 看起来像这样

df

纬度经度站

51.2161 -122.3111 CWDL

52.0780 -122.1795 CPXL

我的建议是转储此操作的数据帧。

lat_lon = zip(df.LATITUDE, df.LONGITUDE)
lat_lon_station = zip(df_station.LATITUDE, df_station.LONGITUDE, df_station.STATION)

results = {}
for lat, lon in lat_lon:
    for station_lat, station_lon, station in lat_lon_station:
        dist = haversine((lat, lon ), ( station_lat, station_lon), unit='km')
        if station in results:
            if results[station] < dist:
                pass
            else:
                results[station] = dist
        else:
            results[stations] = dist

这是否实现了您的目标?

在开头压缩应该会显着改善较大数据帧的运行时间。