Lambda Python Pool.map 和 urllib2.urlopen :仅重试失败的进程,仅记录错误
Lambda Python Pool.map and urllib2.urlopen : Retry only failing processes, log only errors
我有一个 AWS Lambda 函数,它使用 pool.map
调用一组 URLs。问题是,如果 URLs returns 之一不是 200
,Lambda 函数将失败并立即重试。问题是它会立即重试整个 lambda 函数。我希望它只重试失败的 URLs,如果(第二次尝试后)它仍然失败,调用固定的 URL 来记录错误。
这是当前的代码(删除了一些细节),仅当所有 URL 为:
时才有效
from __future__ import print_function
import urllib2
from multiprocessing.dummy import Pool as ThreadPool
import hashlib
import datetime
import json
print('Loading function')
def lambda_handler(event, context):
f = urllib2.urlopen("https://example.com/geturls/?action=something");
data = json.loads(f.read());
urls = [];
for d in data:
urls.append("https://"+d+".example.com/path/to/action");
# Make the Pool of workers
pool = ThreadPool(4);
# Open the urls in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls);
#close the pool and wait for the work to finish
pool.close();
return pool.join();
我尝试阅读 the official documentation,但它似乎缺乏对 map
函数的解释,特别是对 return 值的解释。
使用 urlopen 文档,我尝试将我的代码修改为以下内容:
from __future__ import print_function
import urllib2
from multiprocessing.dummy import Pool as ThreadPool
import hashlib
import datetime
import json
print('Loading function')
def lambda_handler(event, context):
f = urllib2.urlopen("https://example.com/geturls/?action=something");
data = json.loads(f.read());
urls = [];
for d in data:
urls.append("https://"+d+".example.com/path/to/action");
# Make the Pool of workers
pool = ThreadPool(4);
# Open the urls in their own threads
# and return the results
try:
results = pool.map(urllib2.urlopen, urls);
except URLError:
try: # try once more before logging error
urllib2.urlopen(URLError.url); # TODO: figure out which URL errored
except URLError: # log error
urllib2.urlopen("https://example.com/error/?url="+URLError.url);
#close the pool and wait for the work to finish
pool.close();
return true; # always return true so we never duplicate successful calls
我不确定我这样处理异常是否正确,或者我是否正确地制作了 python 异常符号。同样,我的目标是 我希望它只重试失败的 URL,如果(第二次尝试后)它仍然失败,调用一个固定的 URL 来记录一个错误。
感谢 我找到了答案。
答案是为 urllib2.urlopen
函数创建我自己的自定义包装器,因为每个线程本身都需要 try{}catch 而不是整个线程事物。该函数看起来像这样:
def my_urlopen(url):
try:
return urllib2.urlopen(url)
except URLError:
urllib2.urlopen("https://example.com/log_error/?url="+url)
return None
我把它放在 def lambda_handler
函数声明的上方,然后我可以用下面的代码替换其中的整个 try/catch:
try:
results = pool.map(urllib2.urlopen, urls);
except URLError:
try: # try once more before logging error
urllib2.urlopen(URLError.url);
except URLError: # log error
urllib2.urlopen("https://example.com/error/?url="+URLError.url);
为此:
results = pool.map(my_urlopen, urls);
Q.E.D.
我有一个 AWS Lambda 函数,它使用 pool.map
调用一组 URLs。问题是,如果 URLs returns 之一不是 200
,Lambda 函数将失败并立即重试。问题是它会立即重试整个 lambda 函数。我希望它只重试失败的 URLs,如果(第二次尝试后)它仍然失败,调用固定的 URL 来记录错误。
这是当前的代码(删除了一些细节),仅当所有 URL 为:
时才有效from __future__ import print_function
import urllib2
from multiprocessing.dummy import Pool as ThreadPool
import hashlib
import datetime
import json
print('Loading function')
def lambda_handler(event, context):
f = urllib2.urlopen("https://example.com/geturls/?action=something");
data = json.loads(f.read());
urls = [];
for d in data:
urls.append("https://"+d+".example.com/path/to/action");
# Make the Pool of workers
pool = ThreadPool(4);
# Open the urls in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls);
#close the pool and wait for the work to finish
pool.close();
return pool.join();
我尝试阅读 the official documentation,但它似乎缺乏对 map
函数的解释,特别是对 return 值的解释。
使用 urlopen 文档,我尝试将我的代码修改为以下内容:
from __future__ import print_function
import urllib2
from multiprocessing.dummy import Pool as ThreadPool
import hashlib
import datetime
import json
print('Loading function')
def lambda_handler(event, context):
f = urllib2.urlopen("https://example.com/geturls/?action=something");
data = json.loads(f.read());
urls = [];
for d in data:
urls.append("https://"+d+".example.com/path/to/action");
# Make the Pool of workers
pool = ThreadPool(4);
# Open the urls in their own threads
# and return the results
try:
results = pool.map(urllib2.urlopen, urls);
except URLError:
try: # try once more before logging error
urllib2.urlopen(URLError.url); # TODO: figure out which URL errored
except URLError: # log error
urllib2.urlopen("https://example.com/error/?url="+URLError.url);
#close the pool and wait for the work to finish
pool.close();
return true; # always return true so we never duplicate successful calls
我不确定我这样处理异常是否正确,或者我是否正确地制作了 python 异常符号。同样,我的目标是 我希望它只重试失败的 URL,如果(第二次尝试后)它仍然失败,调用一个固定的 URL 来记录一个错误。
感谢
答案是为 urllib2.urlopen
函数创建我自己的自定义包装器,因为每个线程本身都需要 try{}catch 而不是整个线程事物。该函数看起来像这样:
def my_urlopen(url):
try:
return urllib2.urlopen(url)
except URLError:
urllib2.urlopen("https://example.com/log_error/?url="+url)
return None
我把它放在 def lambda_handler
函数声明的上方,然后我可以用下面的代码替换其中的整个 try/catch:
try:
results = pool.map(urllib2.urlopen, urls);
except URLError:
try: # try once more before logging error
urllib2.urlopen(URLError.url);
except URLError: # log error
urllib2.urlopen("https://example.com/error/?url="+URLError.url);
为此:
results = pool.map(my_urlopen, urls);
Q.E.D.