增加 Requests/Second Python Google 地图地理编码器
Increasing Requests/Second Python Google Maps Geocoder
我在使用 Google Maps Geocoder 增加每秒可以发出的请求量时遇到困难。我使用的是付费帐户(请求数为 0.50 美元/1000 美元),因此根据 Google Geocoder API 我应该能够每秒发出多达 50 个请求。
我有一个 15k 地址的列表,我正在尝试为其获取 GPS 坐标。我将它们存储为 Pandas Dataframe 并循环遍历它们。为了确保这不是因为循环缓慢,我测试了它在所有 15k 上的循环速度,结果只用了 1.5 秒。但是我每秒只能发出不到 1 个请求。我意识到这可能是由于我的互联网连接速度较慢,所以我启动了一个 Windows Google 具有明显快速互联网的云虚拟机。我能够将请求速度提高到大约 1.5 个请求/秒,但仍然比理论上可能的速度慢很多。
我认为这可能是由于使用了 python 库地理编码器,所以我尝试直接使用 python 请求发出请求,但这也没有加快速度。
这与我没有使用服务器有关吗?我认为这无关紧要,因为我使用的是 Google 云 VM。另外,我知道这与多线程无关,因为它已经可以使用 1 个内核以极快的速度遍历循环。提前感谢您的任何想法。
import geocoder
import pandas as pd
import time
import requests
startTime = time.time()
#Read File Name with all transactions up to October 4th
input_filename = "C:/Users/username/Downloads/transaction-export 10-04-2017.csv"
df = pd.read_csv(input_filename, header=0, error_bad_lines=False)
#Only look at customer addresses
df = df['Customer Address']
#Drop duplicates and NAs
df = df.drop_duplicates(keep='first')
df = df.dropna()
#convert dataframe to string
addresses = df.tolist()
#Google Api Key
api_key = 'my_api_key'
#create empty array
address_gps = []
#google api address
url = 'https://maps.googleapis.com/maps/api/geocode/json'
#For each address return its geocoded latlng coordinates
for int, val in enumerate(addresses):
''' Direct way to make call without geocoder
params = {'sensor': 'false', 'address': address, 'key': api_key}
r = requests.get(url, params=params)
results = r.json()['results']
location = results[0]['geometry']['location']
print location['lat'], location['lng']
num_address = num_address+1;
'''
endTime = time.time()
g = geocoder.google(val, key=api_key, exactly_one=True)
print "Address,", (val), "Number,", int, "Total,", len(addresses), "Time,", endTime-startTime
if g.ok:
address_gps.append(g.latlng)
print g.latlng
else:
address_gps.append(0)
print("Error")
#save every 100 iterations
if int%100==0:
# save as csv
df1 = pd.DataFrame({'Address GPS': address_gps})
df1.to_csv('C:/Users/username/Downloads/AllCustomerAddressAsGPS.csv')
# save as csv
df1 = pd.DataFrame({'Address GPS': address_gps})
df1.to_csv('C:/Users/username/Downloads/AllCustomerAddressAsGPS.csv')
提高速度的一种方法是维护与 Google 的请求会话,而不是为每个请求创建一个新会话。这是在 geocoder
documentation.
中建议的
您修改后的代码将是:
import requests
#Google Api Key
api_key = 'my_api_key'
#create empty array
address_gps = []
#google api address
url = 'https://maps.googleapis.com/maps/api/geocode/json'
#For each address return its geocoded latlng coordinates
with requests.Session() as session:
for int, val in enumerate(addresses):
''' Direct way to make call without geocoder
params = {'sensor': 'false', 'address': address, 'key': api_key}
r = requests.get(url, params=params)
results = r.json()['results']
location = results[0]['geometry']['location']
print location['lat'], location['lng']
num_address = num_address+1;
'''
endTime = time.time()
g = geocoder.google(val, key=api_key, exactly_one=True, session=session)
print "Address,", (val), "Number,", int, "Total,", len(addresses), "Time,", endTime-startTime
if g.ok:
address_gps.append(g.latlng)
print g.latlng
else:
address_gps.append(0)
print("Error")
#save every 100 iterations
if int%100==0:
# save as csv
df1 = pd.DataFrame({'Address GPS': address_gps})
df1.to_csv('C:/Users/username/Downloads/AllCustomerAddressAsGPS.csv')
# save as csv
df1 = pd.DataFrame({'Address GPS': address_gps})
df1.to_csv('C:/Users/username/Downloads/AllCustomerAddressAsGPS.csv')
我在使用 Google Maps Geocoder 增加每秒可以发出的请求量时遇到困难。我使用的是付费帐户(请求数为 0.50 美元/1000 美元),因此根据 Google Geocoder API 我应该能够每秒发出多达 50 个请求。
我有一个 15k 地址的列表,我正在尝试为其获取 GPS 坐标。我将它们存储为 Pandas Dataframe 并循环遍历它们。为了确保这不是因为循环缓慢,我测试了它在所有 15k 上的循环速度,结果只用了 1.5 秒。但是我每秒只能发出不到 1 个请求。我意识到这可能是由于我的互联网连接速度较慢,所以我启动了一个 Windows Google 具有明显快速互联网的云虚拟机。我能够将请求速度提高到大约 1.5 个请求/秒,但仍然比理论上可能的速度慢很多。
我认为这可能是由于使用了 python 库地理编码器,所以我尝试直接使用 python 请求发出请求,但这也没有加快速度。
这与我没有使用服务器有关吗?我认为这无关紧要,因为我使用的是 Google 云 VM。另外,我知道这与多线程无关,因为它已经可以使用 1 个内核以极快的速度遍历循环。提前感谢您的任何想法。
import geocoder
import pandas as pd
import time
import requests
startTime = time.time()
#Read File Name with all transactions up to October 4th
input_filename = "C:/Users/username/Downloads/transaction-export 10-04-2017.csv"
df = pd.read_csv(input_filename, header=0, error_bad_lines=False)
#Only look at customer addresses
df = df['Customer Address']
#Drop duplicates and NAs
df = df.drop_duplicates(keep='first')
df = df.dropna()
#convert dataframe to string
addresses = df.tolist()
#Google Api Key
api_key = 'my_api_key'
#create empty array
address_gps = []
#google api address
url = 'https://maps.googleapis.com/maps/api/geocode/json'
#For each address return its geocoded latlng coordinates
for int, val in enumerate(addresses):
''' Direct way to make call without geocoder
params = {'sensor': 'false', 'address': address, 'key': api_key}
r = requests.get(url, params=params)
results = r.json()['results']
location = results[0]['geometry']['location']
print location['lat'], location['lng']
num_address = num_address+1;
'''
endTime = time.time()
g = geocoder.google(val, key=api_key, exactly_one=True)
print "Address,", (val), "Number,", int, "Total,", len(addresses), "Time,", endTime-startTime
if g.ok:
address_gps.append(g.latlng)
print g.latlng
else:
address_gps.append(0)
print("Error")
#save every 100 iterations
if int%100==0:
# save as csv
df1 = pd.DataFrame({'Address GPS': address_gps})
df1.to_csv('C:/Users/username/Downloads/AllCustomerAddressAsGPS.csv')
# save as csv
df1 = pd.DataFrame({'Address GPS': address_gps})
df1.to_csv('C:/Users/username/Downloads/AllCustomerAddressAsGPS.csv')
提高速度的一种方法是维护与 Google 的请求会话,而不是为每个请求创建一个新会话。这是在 geocoder
documentation.
您修改后的代码将是:
import requests
#Google Api Key
api_key = 'my_api_key'
#create empty array
address_gps = []
#google api address
url = 'https://maps.googleapis.com/maps/api/geocode/json'
#For each address return its geocoded latlng coordinates
with requests.Session() as session:
for int, val in enumerate(addresses):
''' Direct way to make call without geocoder
params = {'sensor': 'false', 'address': address, 'key': api_key}
r = requests.get(url, params=params)
results = r.json()['results']
location = results[0]['geometry']['location']
print location['lat'], location['lng']
num_address = num_address+1;
'''
endTime = time.time()
g = geocoder.google(val, key=api_key, exactly_one=True, session=session)
print "Address,", (val), "Number,", int, "Total,", len(addresses), "Time,", endTime-startTime
if g.ok:
address_gps.append(g.latlng)
print g.latlng
else:
address_gps.append(0)
print("Error")
#save every 100 iterations
if int%100==0:
# save as csv
df1 = pd.DataFrame({'Address GPS': address_gps})
df1.to_csv('C:/Users/username/Downloads/AllCustomerAddressAsGPS.csv')
# save as csv
df1 = pd.DataFrame({'Address GPS': address_gps})
df1.to_csv('C:/Users/username/Downloads/AllCustomerAddressAsGPS.csv')