使用 geopy 返回 longitude/latitude 的邮政编码 - 避免 GeocoderTimedOut: ('Service timed out', 'occurred at index ...')
Returning zipcodes for longitude/latitude with geopy - avoiding GeocoderTimedOut: ('Service timed out', 'occurred at index ...')
这个问题并不新鲜,已经讨论过多次,但我是 Python 的新手。
Geopy too slow - timeout all the time
Timeout error in Python geopy geocoder
我有一个包含 11000 个地理位置的数据集,我想知道它们的邮政编码。
我的数据如下:
Longitude Latitude
0 -87.548627 41.728184
1 -87.737227 41.749111
2 -87.743974 41.924143
3 -87.659294 41.869314
4 -87.727808 41.877007
使用这个 question,我编写了一个函数,它适用于前 10-20 行,但会出现超时错误。
# Create a function for zip codes extraction
def get_zipcode(df, geolocator, lat_field, lon_field):
location = geolocator.reverse((df[lat_field], df[lon_field]))
return location.raw['address']['postcode']
geolocator = geopy.Nominatim(user_agent = 'my-application')
# Test a sample with 20 rows
test = bus_stops_geo.head(20)
# Extract zip codes for the sample
zipcodes = test.apply(get_zipcode, axis = 1, geolocator = geolocator,
lat_field = 'Latitude', lon_field = 'Longitude')
print(zipcodes)
0 60617
1 60652
2 60639
3 60607
4 60644
5 60659
6 60620
7 60626
8 60610
9 60660
10 60625
11 60645
12 60628
13 60620
14 60629
15 60628
16 60644
17 60638
18 60657
19 60631
dtype: object
我尝试更改超时时间,但到目前为止都失败了。
我的问题:
- 如何实现 11000 行?
- 如何重写此函数,return 不仅压缩,而且初始经度和纬度也如此?
- 是否有使用 R 等编程语言或使用专有软件的简单替代解决方案(付费选项适合我)?
非常感谢任何帮助!
这不是 Python 解决方案和 shapefile 的不同方法,但为了 SO 社区的利益,我将 post 用 R 回答(@Jaap 的提示和这个 ).
首先,您需要US zipcodes shapefile(美国位置)。
# Install packages
library(sp)
library(rgdal)
library(tidyverse)
# Import zips shapefile and transform CRS
zips <- readOGR("cb_2015_us_zcta510_500k.shp")
zips <- spTransform(zips, CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
# Read bus stops data
bus_stops <- read_csv("CTA_BusStops.csv")
# A tibble: 6 x 14
SYSTEMSTOP OBJECTID the_geom STREET CROSS_ST
DIR POS ROUTESSTPG OWLROUTES CITY STATUS PUBLIC_NAM POINT_X POINT_Y
<dbl> <dbl> <chr> <chr> <chr>
<chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 11953 193 POINT (-87.54862703700002 41.72818~ 92ND STRE~ BALTIMORE
EB NS 95 NA CHICA~ 1 92nd Street & Balt~ -87.5 41.7
2 2723 194 POINT (-87.737227163 41.7491110710~ 79TH STRE~ KILPATRICK (east~
EB NS 79 NA CHICA~ 1 79th Street & Kilp~ -87.7 41.7
3 1307 195 POINT (-87.74397362600001 41.92414~ FULLERTON KILPATRICK
EB NS 74 NA CHICA~ 1 Fullerton & Kilpat~ -87.7 41.9
4 6696 196 POINT (-87.65929365400001 41.86931~ TAYLOR THROOP
EB NS 157 NA CHICA~ 1 Taylor & Throop -87.7 41.9
5 22 197 POINT (-87.72780787099998 41.87700~ JACKSON KARLOV
EB FS 126 NA CHICA~ 1 Jackson & Karlov -87.7 41.9
6 4767 198 POINT (-87.71978000000001 41.97552~ FOSTER MONTICELLO
EB NS 92 NA CHICA~ 1 Foster & Monticello -87.7 42.0
# Rename two columns for long and lat
bus_stops <- rename(bus_stops, longitude = POINT_X, latitude = POINT_Y)
# Extract long and lat columns only
test_bus_stops <- bus_stops[ , c(13, 14)]
# Transform coordinates into a SpatialPointsDataFrame
bus_stops_spatial <- SpatialPointsDataFrame(coords = test_bus_stops, data = bus_stops,
proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
class(bus_stops_spatial) # [1] "SpatialPointsDataFrame"
# Subset only the zipcodes in which points are found
zips_subset <- zips[bus_stops_spatial, ]
# NOTE: the column in zips_subset containing zipcodes is ZCTA5CE10
# Use over() to overlay points in polygons and then add that to the original dataframe
# Create a separate column
bus_stops_zip <- over(bus_stops_spatial, zips_subset[, "ZCTA5CE10"])
bus_stops_zip
我通过 this link 随机检查了经度和纬度的结果。它匹配。
# Merge bus stops with bus stop zips
bus_stops <- cbind(bus_stops, bus_stops_zip)
# Rename zip_code column
bus_stops <- rename(bus_stops, zip_code = ZCTA5CE10)
head(bus_stops)
# Display zip
head(bus_stops[, 13:15])
longitude latitude zip_code
1 -87.54863 41.72818 60617
2 -87.73723 41.74911 60652
3 -87.74397 41.92414 60639
4 -87.65929 41.86931 60607
5 -87.72781 41.87701 60624
6 -87.71978 41.97553 60625
# Extract as csv
write.csv(bus_stops,'bus_stops_zips.csv')
geopy 与 pandas 的用法在文档中有描述:https://geopy.readthedocs.io/en/1.22.0/#usage-with-pandas
geopy 解决方案:
In [1]: import pandas as pd
...:
...: df = pd.DataFrame([
...: [-87.548627, 41.728184],
...: [-87.737227, 41.749111],
...: [-87.743974, 41.924143],
...: [-87.659294, 41.869314],
...: [-87.727808, 41.877007],
...: ], columns=["Longitude", "Latitude"])
In [2]: from tqdm import tqdm
...: tqdm.pandas()
...:
...: from geopy.geocoders import Nominatim
...: geolocator = Nominatim(user_agent="specify_your_app_name_here")
...:
...: from geopy.extra.rate_limiter import RateLimiter
...: reverse = RateLimiter(geolocator.reverse, min_delay_seconds=1)
In [3]: df['Location'] = df.progress_apply(
...: lambda row: reverse((row['Latitude'], row['Longitude'])),
...: axis=1
...: )
100%|█████████████████████████████| 5/5 [00:06<00:00, 1.24s/it]
In [4]: def parse_zipcode(location):
...: if location and location.raw.get('address') and location.raw['address'].get('postcode'):
...: return location.raw['address']['postcode']
...: else:
...: return None
...: df['Zipcode'] = df['Location'].apply(parse_zipcode)
In [5]: df
Out[5]:
Longitude Latitude Location Zipcode
0 -87.548627 41.728184 (Olive Harvey College South Chicago Learning C... 60617
1 -87.737227 41.749111 (7900, South Kilpatrick Avenue, Chicago, Cook ... 60652
2 -87.743974 41.924143 (4701, West Fullerton Avenue, Beat 2522, Belmo... 60639
3 -87.659294 41.869314 (1301-1307, West Taylor Street, Near West Side... 60607
4 -87.727808 41.877007 (4053, West Jackson Boulevard, West Garfield P... 60644
如果付费选项适合您,请考虑使用免费 Nominatim 以外的任何其他地理编码服务,例如 MapQuest 或 PickPoint。
这个问题并不新鲜,已经讨论过多次,但我是 Python 的新手。
Geopy too slow - timeout all the time
Timeout error in Python geopy geocoder
我有一个包含 11000 个地理位置的数据集,我想知道它们的邮政编码。
我的数据如下:
Longitude Latitude
0 -87.548627 41.728184
1 -87.737227 41.749111
2 -87.743974 41.924143
3 -87.659294 41.869314
4 -87.727808 41.877007
使用这个 question,我编写了一个函数,它适用于前 10-20 行,但会出现超时错误。
# Create a function for zip codes extraction
def get_zipcode(df, geolocator, lat_field, lon_field):
location = geolocator.reverse((df[lat_field], df[lon_field]))
return location.raw['address']['postcode']
geolocator = geopy.Nominatim(user_agent = 'my-application')
# Test a sample with 20 rows
test = bus_stops_geo.head(20)
# Extract zip codes for the sample
zipcodes = test.apply(get_zipcode, axis = 1, geolocator = geolocator,
lat_field = 'Latitude', lon_field = 'Longitude')
print(zipcodes)
0 60617
1 60652
2 60639
3 60607
4 60644
5 60659
6 60620
7 60626
8 60610
9 60660
10 60625
11 60645
12 60628
13 60620
14 60629
15 60628
16 60644
17 60638
18 60657
19 60631
dtype: object
我尝试更改超时时间,但到目前为止都失败了。
我的问题:
- 如何实现 11000 行?
- 如何重写此函数,return 不仅压缩,而且初始经度和纬度也如此?
- 是否有使用 R 等编程语言或使用专有软件的简单替代解决方案(付费选项适合我)?
非常感谢任何帮助!
这不是 Python 解决方案和 shapefile 的不同方法,但为了 SO 社区的利益,我将 post 用 R 回答(@Jaap 的提示和这个
首先,您需要US zipcodes shapefile(美国位置)。
# Install packages
library(sp)
library(rgdal)
library(tidyverse)
# Import zips shapefile and transform CRS
zips <- readOGR("cb_2015_us_zcta510_500k.shp")
zips <- spTransform(zips, CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
# Read bus stops data
bus_stops <- read_csv("CTA_BusStops.csv")
# A tibble: 6 x 14
SYSTEMSTOP OBJECTID the_geom STREET CROSS_ST
DIR POS ROUTESSTPG OWLROUTES CITY STATUS PUBLIC_NAM POINT_X POINT_Y
<dbl> <dbl> <chr> <chr> <chr>
<chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 11953 193 POINT (-87.54862703700002 41.72818~ 92ND STRE~ BALTIMORE
EB NS 95 NA CHICA~ 1 92nd Street & Balt~ -87.5 41.7
2 2723 194 POINT (-87.737227163 41.7491110710~ 79TH STRE~ KILPATRICK (east~
EB NS 79 NA CHICA~ 1 79th Street & Kilp~ -87.7 41.7
3 1307 195 POINT (-87.74397362600001 41.92414~ FULLERTON KILPATRICK
EB NS 74 NA CHICA~ 1 Fullerton & Kilpat~ -87.7 41.9
4 6696 196 POINT (-87.65929365400001 41.86931~ TAYLOR THROOP
EB NS 157 NA CHICA~ 1 Taylor & Throop -87.7 41.9
5 22 197 POINT (-87.72780787099998 41.87700~ JACKSON KARLOV
EB FS 126 NA CHICA~ 1 Jackson & Karlov -87.7 41.9
6 4767 198 POINT (-87.71978000000001 41.97552~ FOSTER MONTICELLO
EB NS 92 NA CHICA~ 1 Foster & Monticello -87.7 42.0
# Rename two columns for long and lat
bus_stops <- rename(bus_stops, longitude = POINT_X, latitude = POINT_Y)
# Extract long and lat columns only
test_bus_stops <- bus_stops[ , c(13, 14)]
# Transform coordinates into a SpatialPointsDataFrame
bus_stops_spatial <- SpatialPointsDataFrame(coords = test_bus_stops, data = bus_stops,
proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
class(bus_stops_spatial) # [1] "SpatialPointsDataFrame"
# Subset only the zipcodes in which points are found
zips_subset <- zips[bus_stops_spatial, ]
# NOTE: the column in zips_subset containing zipcodes is ZCTA5CE10
# Use over() to overlay points in polygons and then add that to the original dataframe
# Create a separate column
bus_stops_zip <- over(bus_stops_spatial, zips_subset[, "ZCTA5CE10"])
bus_stops_zip
我通过 this link 随机检查了经度和纬度的结果。它匹配。
# Merge bus stops with bus stop zips
bus_stops <- cbind(bus_stops, bus_stops_zip)
# Rename zip_code column
bus_stops <- rename(bus_stops, zip_code = ZCTA5CE10)
head(bus_stops)
# Display zip
head(bus_stops[, 13:15])
longitude latitude zip_code
1 -87.54863 41.72818 60617
2 -87.73723 41.74911 60652
3 -87.74397 41.92414 60639
4 -87.65929 41.86931 60607
5 -87.72781 41.87701 60624
6 -87.71978 41.97553 60625
# Extract as csv
write.csv(bus_stops,'bus_stops_zips.csv')
geopy 与 pandas 的用法在文档中有描述:https://geopy.readthedocs.io/en/1.22.0/#usage-with-pandas
geopy 解决方案:
In [1]: import pandas as pd
...:
...: df = pd.DataFrame([
...: [-87.548627, 41.728184],
...: [-87.737227, 41.749111],
...: [-87.743974, 41.924143],
...: [-87.659294, 41.869314],
...: [-87.727808, 41.877007],
...: ], columns=["Longitude", "Latitude"])
In [2]: from tqdm import tqdm
...: tqdm.pandas()
...:
...: from geopy.geocoders import Nominatim
...: geolocator = Nominatim(user_agent="specify_your_app_name_here")
...:
...: from geopy.extra.rate_limiter import RateLimiter
...: reverse = RateLimiter(geolocator.reverse, min_delay_seconds=1)
In [3]: df['Location'] = df.progress_apply(
...: lambda row: reverse((row['Latitude'], row['Longitude'])),
...: axis=1
...: )
100%|█████████████████████████████| 5/5 [00:06<00:00, 1.24s/it]
In [4]: def parse_zipcode(location):
...: if location and location.raw.get('address') and location.raw['address'].get('postcode'):
...: return location.raw['address']['postcode']
...: else:
...: return None
...: df['Zipcode'] = df['Location'].apply(parse_zipcode)
In [5]: df
Out[5]:
Longitude Latitude Location Zipcode
0 -87.548627 41.728184 (Olive Harvey College South Chicago Learning C... 60617
1 -87.737227 41.749111 (7900, South Kilpatrick Avenue, Chicago, Cook ... 60652
2 -87.743974 41.924143 (4701, West Fullerton Avenue, Beat 2522, Belmo... 60639
3 -87.659294 41.869314 (1301-1307, West Taylor Street, Near West Side... 60607
4 -87.727808 41.877007 (4053, West Jackson Boulevard, West Garfield P... 60644
如果付费选项适合您,请考虑使用免费 Nominatim 以外的任何其他地理编码服务,例如 MapQuest 或 PickPoint。