Python Pandas 对所有列应用反向地理编码功能需要太长时间?
Python Pandas Applying Reverse Geocoding Function to All Columns Takes Too Long?
我正在尝试使用此库将 4 列反向地理编码为位置名称。
https://github.com/thampiman/reverse-geocoder
代码正在运行,但即使是 20 行也需要大约 30 秒,我有超过 100.000 行所以需要 forever.I 想知道为什么会这样?
示例数据
pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude
-73.982155 40.767937 -73.964630 40.765602
-73.981049 40.744339 -73.973000 40.789989
结果:
pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude pickup_district dropoff_district
-73.982155 40.767937 -73.964630 40.765602 Manhattan Manhattan
-73.981049 40.744339 -73.973000 40.789989 Long Island City Manhattan
代码:
ds['pickup_district'] = ds.apply(lambda row: rg.search((row['pickup_latitude'],row['pickup_longitude']))[0]['name'],axis=1)
ds['dropoff_district'] = ds.apply(lambda row: rg.search((row['dropoff_latitude'],row['dropoff_longitude']))[0]['name'],axis=1)
加上 basmadan geçmeyin sincaplar ;)
您当前的结构正在为 DataFrame
中的每一行调用一次 rg.search
方法。
首先创建一个元组列表,然后调用 rg.search
一次下车,一次上车,效率会更高。例如:
pickup_coords = ds[['pickup_latitude', 'pickup_longitude']].apply(tuple, axis=1).tolist()
dropoff_coords = ds[['dropoff_latitude', 'dropoff_longitude']].apply(tuple, axis=1).tolist()
pickup_results = rg.search(pickup_coords, mode=2)
ds['pickup_district'] = [x['name'] for x in pickup_results]
dropoff_results = rg.search(dropoff_coords, mode=2)
ds['dropoff_district'] = [x['name'] for x in dropoff_results]
您可以一次调用所有位置的图书馆。例如:
pickups = list(zip(ds.pickup_latitude, ds.pickup_longitude))
dropoffs = list(zip(ds.dropoff_latitude, ds.dropoff_longitude))
pickup_locations = rg.search(pickups)
dropoff_locations = rg.search(dropoffs)
ds['pickup_district'] = [p["name"] for p in pickup_locations]
ds['dropoff_district'] = [d["name"] for d in dropoff_locations]
这比调用每一行(如 apply 所做的那样)要快得多。
我正在尝试使用此库将 4 列反向地理编码为位置名称。 https://github.com/thampiman/reverse-geocoder 代码正在运行,但即使是 20 行也需要大约 30 秒,我有超过 100.000 行所以需要 forever.I 想知道为什么会这样?
示例数据
pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude
-73.982155 40.767937 -73.964630 40.765602
-73.981049 40.744339 -73.973000 40.789989
结果:
pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude pickup_district dropoff_district
-73.982155 40.767937 -73.964630 40.765602 Manhattan Manhattan
-73.981049 40.744339 -73.973000 40.789989 Long Island City Manhattan
代码:
ds['pickup_district'] = ds.apply(lambda row: rg.search((row['pickup_latitude'],row['pickup_longitude']))[0]['name'],axis=1)
ds['dropoff_district'] = ds.apply(lambda row: rg.search((row['dropoff_latitude'],row['dropoff_longitude']))[0]['name'],axis=1)
加上 basmadan geçmeyin sincaplar ;)
您当前的结构正在为 DataFrame
中的每一行调用一次 rg.search
方法。
首先创建一个元组列表,然后调用 rg.search
一次下车,一次上车,效率会更高。例如:
pickup_coords = ds[['pickup_latitude', 'pickup_longitude']].apply(tuple, axis=1).tolist()
dropoff_coords = ds[['dropoff_latitude', 'dropoff_longitude']].apply(tuple, axis=1).tolist()
pickup_results = rg.search(pickup_coords, mode=2)
ds['pickup_district'] = [x['name'] for x in pickup_results]
dropoff_results = rg.search(dropoff_coords, mode=2)
ds['dropoff_district'] = [x['name'] for x in dropoff_results]
您可以一次调用所有位置的图书馆。例如:
pickups = list(zip(ds.pickup_latitude, ds.pickup_longitude))
dropoffs = list(zip(ds.dropoff_latitude, ds.dropoff_longitude))
pickup_locations = rg.search(pickups)
dropoff_locations = rg.search(dropoffs)
ds['pickup_district'] = [p["name"] for p in pickup_locations]
ds['dropoff_district'] = [d["name"] for d in dropoff_locations]
这比调用每一行(如 apply 所做的那样)要快得多。