使用 geopy 返回 longitude/latitude 的邮政编码 - 避免 GeocoderTimedOut: ('Service timed out', 'occurred at index ...')

Question

这个问题并不新鲜，已经讨论过多次，但我是 Python 的新手。

Geopy too slow - timeout all the time

Timeout error in Python geopy geocoder

我有一个包含 11000 个地理位置的数据集，我想知道它们的邮政编码。

我的数据如下：

   Longitude   Latitude
0 -87.548627  41.728184
1 -87.737227  41.749111
2 -87.743974  41.924143
3 -87.659294  41.869314
4 -87.727808  41.877007

使用这个 question，我编写了一个函数，它适用于前 10-20 行，但会出现超时错误。

# Create a function for zip codes extraction
def get_zipcode(df, geolocator, lat_field, lon_field):
   location = geolocator.reverse((df[lat_field], df[lon_field]))
   return location.raw['address']['postcode']

geolocator = geopy.Nominatim(user_agent = 'my-application')

# Test a sample with 20 rows
test = bus_stops_geo.head(20)

# Extract zip codes for the sample
zipcodes = test.apply(get_zipcode, axis = 1, geolocator = geolocator, 
                           lat_field = 'Latitude', lon_field = 'Longitude')

print(zipcodes)

0     60617
1     60652
2     60639
3     60607
4     60644
5     60659
6     60620
7     60626
8     60610
9     60660
10    60625
11    60645
12    60628
13    60620
14    60629
15    60628
16    60644
17    60638
18    60657
19    60631
dtype: object

我尝试更改超时时间，但到目前为止都失败了。

我的问题：

如何实现 11000 行？
如何重写此函数，return 不仅压缩，而且初始经度和纬度也如此？
是否有使用 R 等编程语言或使用专有软件的简单替代解决方案（付费选项适合我）？

非常感谢任何帮助！

Answer 1

这不是 Python 解决方案和 shapefile 的不同方法，但为了 SO 社区的利益，我将 post 用 R 回答（@Jaap 的提示和这个 ).

首先，您需要US zipcodes shapefile（美国位置）。

# Install packages
library(sp)
library(rgdal)
library(tidyverse)

# Import zips shapefile and transform CRS 
zips <- readOGR("cb_2015_us_zcta510_500k.shp")
zips <- spTransform(zips, CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))

# Read bus stops data
bus_stops <- read_csv("CTA_BusStops.csv")

# A tibble: 6 x 14
SYSTEMSTOP OBJECTID the_geom                            STREET     CROSS_ST          
DIR   POS   ROUTESSTPG OWLROUTES CITY   STATUS PUBLIC_NAM          POINT_X POINT_Y
   <dbl>    <dbl> <chr>                               <chr>      <chr>             
<chr> <chr> <chr>      <chr>     <chr>   <dbl> <chr>                 <dbl>   <dbl>
1      11953      193 POINT (-87.54862703700002 41.72818~ 92ND STRE~ BALTIMORE         
EB    NS    95         NA        CHICA~      1 92nd Street & Balt~   -87.5    41.7
2       2723      194 POINT (-87.737227163 41.7491110710~ 79TH STRE~ KILPATRICK (east~ 
EB    NS    79         NA        CHICA~      1 79th Street & Kilp~   -87.7    41.7
3       1307      195 POINT (-87.74397362600001 41.92414~ FULLERTON  KILPATRICK        
EB    NS    74         NA        CHICA~      1 Fullerton & Kilpat~   -87.7    41.9
4       6696      196 POINT (-87.65929365400001 41.86931~ TAYLOR     THROOP            
EB    NS    157        NA        CHICA~      1 Taylor & Throop       -87.7    41.9
5         22      197 POINT (-87.72780787099998 41.87700~ JACKSON    KARLOV            
EB    FS    126        NA        CHICA~      1 Jackson & Karlov      -87.7    41.9
6       4767      198 POINT (-87.71978000000001 41.97552~ FOSTER     MONTICELLO        
EB    NS    92         NA        CHICA~      1 Foster & Monticello   -87.7    42.0

# Rename two columns for long and lat
bus_stops <- rename(bus_stops, longitude = POINT_X, latitude = POINT_Y)

# Extract long and lat columns only
test_bus_stops <- bus_stops[ , c(13, 14)]

# Transform coordinates into a SpatialPointsDataFrame
bus_stops_spatial <- SpatialPointsDataFrame(coords = test_bus_stops, data = bus_stops, 
proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
class(bus_stops_spatial) # [1] "SpatialPointsDataFrame"

# Subset only the zipcodes in which points are found
zips_subset <- zips[bus_stops_spatial, ]

# NOTE: the column in zips_subset containing zipcodes is ZCTA5CE10
# Use over() to overlay points in polygons and then add that to the original dataframe
# Create a separate column
bus_stops_zip <- over(bus_stops_spatial, zips_subset[, "ZCTA5CE10"]) 
bus_stops_zip

我通过 this link 随机检查了经度和纬度的结果。它匹配。

# Merge bus stops with bus stop zips
bus_stops <- cbind(bus_stops, bus_stops_zip)

# Rename zip_code column
bus_stops <- rename(bus_stops, zip_code = ZCTA5CE10)
head(bus_stops)

# Display zip
head(bus_stops[, 13:15])
longitude latitude zip_code
1 -87.54863 41.72818    60617
2 -87.73723 41.74911    60652
3 -87.74397 41.92414    60639
4 -87.65929 41.86931    60607
5 -87.72781 41.87701    60624
6 -87.71978 41.97553    60625

# Extract as csv
write.csv(bus_stops,'bus_stops_zips.csv')

Answer 2

geopy 与 pandas 的用法在文档中有描述：https://geopy.readthedocs.io/en/1.22.0/#usage-with-pandas

geopy 解决方案：

In [1]: import pandas as pd
   ...:
   ...: df = pd.DataFrame([
   ...:     [-87.548627, 41.728184],
   ...:     [-87.737227, 41.749111],
   ...:     [-87.743974, 41.924143],
   ...:     [-87.659294, 41.869314],
   ...:     [-87.727808, 41.877007],
   ...: ], columns=["Longitude", "Latitude"])

In [2]: from tqdm import tqdm
   ...: tqdm.pandas()
   ...:
   ...: from geopy.geocoders import Nominatim
   ...: geolocator = Nominatim(user_agent="specify_your_app_name_here")
   ...:
   ...: from geopy.extra.rate_limiter import RateLimiter
   ...: reverse = RateLimiter(geolocator.reverse, min_delay_seconds=1)

In [3]: df['Location'] = df.progress_apply(
   ...:     lambda row: reverse((row['Latitude'], row['Longitude'])),
   ...:     axis=1
   ...: )
100%|█████████████████████████████| 5/5 [00:06<00:00,  1.24s/it]

In [4]: def parse_zipcode(location):
   ...:     if location and location.raw.get('address') and location.raw['address'].get('postcode'):
   ...:         return location.raw['address']['postcode']
   ...:     else:
   ...:         return None
   ...: df['Zipcode'] = df['Location'].apply(parse_zipcode)

In [5]: df
Out[5]:
   Longitude   Latitude                                           Location Zipcode
0 -87.548627  41.728184  (Olive Harvey College South Chicago Learning C...   60617
1 -87.737227  41.749111  (7900, South Kilpatrick Avenue, Chicago, Cook ...   60652
2 -87.743974  41.924143  (4701, West Fullerton Avenue, Beat 2522, Belmo...   60639
3 -87.659294  41.869314  (1301-1307, West Taylor Street, Near West Side...   60607
4 -87.727808  41.877007  (4053, West Jackson Boulevard, West Garfield P...   60644

如果付费选项适合您，请考虑使用免费 Nominatim 以外的任何其他地理编码服务，例如 MapQuest 或 PickPoint。

使用 geopy 返回 longitude/latitude 的邮政编码 - 避免 GeocoderTimedOut: ('Service timed out', 'occurred at index ...')

Returning zipcodes for longitude/latitude with geopy - avoiding GeocoderTimedOut: ('Service timed out', 'occurred at index ...')

python

geopy

pandas