For循环遍历两个数据帧以从一个df中提取项目并将其存储在另一个
For loop over two data frames to extract item from one df and store it on other
我有两个数据框df
和df_station
df
经纬度
51.2161 -122.3111
52.0780 -122.1795
df_station
站经纬度
牛熊证 -128.1567 52.1850
CWAE -122.9547 50.1290
CWCL -121.5047 51.1447
CWEB -126.5431 49.3833
我必须在 df[STATION
] 中创建一个新列,其中包含 station
。此站点值是根据最小距离选择的。
我的代码
station = []
for i, r in df.iterrows():
lon1 = r['LONGITUDE']
lat1 = r['LATITUDE']
dist = []
for i, v in df_station.iterrows():
lon2 = v['lon']
lat2 = v['lat']
dist.append(haversine((lat1, lon1 ), ( lat2 , lon2), unit='km')
station.append(df_station['station'][dist.index(min(dist))])
# store station name
df['STATION'] = station
错误 -- station.append(df_station['station'][dist.index(min(dist))])
语法错误:语法无效
不确定如何在第一个 for 循环中使用 df_station。
想要的结果是数据框 df 看起来像这样
df
纬度经度站
51.2161 -122.3111 CWDL
52.0780 -122.1795 CPXL
我的建议是转储此操作的数据帧。
lat_lon = zip(df.LATITUDE, df.LONGITUDE)
lat_lon_station = zip(df_station.LATITUDE, df_station.LONGITUDE, df_station.STATION)
results = {}
for lat, lon in lat_lon:
for station_lat, station_lon, station in lat_lon_station:
dist = haversine((lat, lon ), ( station_lat, station_lon), unit='km')
if station in results:
if results[station] < dist:
pass
else:
results[station] = dist
else:
results[stations] = dist
这是否实现了您的目标?
在开头压缩应该会显着改善较大数据帧的运行时间。
我有两个数据框df
和df_station
df
经纬度
51.2161 -122.3111
52.0780 -122.1795
df_station
站经纬度
牛熊证 -128.1567 52.1850
CWAE -122.9547 50.1290
CWCL -121.5047 51.1447
CWEB -126.5431 49.3833
我必须在 df[STATION
] 中创建一个新列,其中包含 station
。此站点值是根据最小距离选择的。
我的代码
station = []
for i, r in df.iterrows():
lon1 = r['LONGITUDE']
lat1 = r['LATITUDE']
dist = []
for i, v in df_station.iterrows():
lon2 = v['lon']
lat2 = v['lat']
dist.append(haversine((lat1, lon1 ), ( lat2 , lon2), unit='km')
station.append(df_station['station'][dist.index(min(dist))])
# store station name
df['STATION'] = station
错误 -- station.append(df_station['station'][dist.index(min(dist))]) 语法错误:语法无效
不确定如何在第一个 for 循环中使用 df_station。
想要的结果是数据框 df 看起来像这样
df
纬度经度站
51.2161 -122.3111 CWDL
52.0780 -122.1795 CPXL
我的建议是转储此操作的数据帧。
lat_lon = zip(df.LATITUDE, df.LONGITUDE)
lat_lon_station = zip(df_station.LATITUDE, df_station.LONGITUDE, df_station.STATION)
results = {}
for lat, lon in lat_lon:
for station_lat, station_lon, station in lat_lon_station:
dist = haversine((lat, lon ), ( station_lat, station_lon), unit='km')
if station in results:
if results[station] < dist:
pass
else:
results[station] = dist
else:
results[stations] = dist
这是否实现了您的目标?
在开头压缩应该会显着改善较大数据帧的运行时间。