Lambda Python Pool.map 和 urllib2.urlopen :仅重试失败的进程,仅记录错误

Lambda Python Pool.map and urllib2.urlopen : Retry only failing processes, log only errors

我有一个 AWS Lambda 函数,它使用 pool.map 调用一组 URLs。问题是,如果 URLs returns 之一不是 200,Lambda 函数将失败并立即重试。问题是它会立即重试整个 lambda 函数。我希望它只重试失败的 URLs,如果(第二次尝试后)它仍然失败,调用固定的 URL 来记录错误。

这是当前的代码(删除了一些细节),仅当所有 URL 为:

时才有效
from __future__ import print_function
import urllib2 
from multiprocessing.dummy import Pool as ThreadPool 

import hashlib
import datetime
import json

print('Loading function')

def lambda_handler(event, context):

  f = urllib2.urlopen("https://example.com/geturls/?action=something");
  data = json.loads(f.read());

  urls = [];
  for d in data:
      urls.append("https://"+d+".example.com/path/to/action");

  # Make the Pool of workers
  pool = ThreadPool(4);

  # Open the urls in their own threads
  # and return the results
  results = pool.map(urllib2.urlopen, urls);

  #close the pool and wait for the work to finish 
  pool.close();
  return pool.join();

我尝试阅读 the official documentation,但它似乎缺乏对 map 函数的解释,特别是对 return 值的解释。

使用 urlopen 文档,我尝试将我的代码修改为以下内容:

from __future__ import print_function
import urllib2 
from multiprocessing.dummy import Pool as ThreadPool 

import hashlib
import datetime
import json

print('Loading function')

def lambda_handler(event, context):

  f = urllib2.urlopen("https://example.com/geturls/?action=something");
  data = json.loads(f.read());

  urls = [];
  for d in data:
      urls.append("https://"+d+".example.com/path/to/action");

  # Make the Pool of workers
  pool = ThreadPool(4);

  # Open the urls in their own threads
  # and return the results
  try:
     results = pool.map(urllib2.urlopen, urls);
  except URLError:
     try:                              # try once more before logging error
        urllib2.urlopen(URLError.url); # TODO: figure out which URL errored
     except URLError:                  # log error
        urllib2.urlopen("https://example.com/error/?url="+URLError.url);

  #close the pool and wait for the work to finish 
  pool.close();
  return true; # always return true so we never duplicate successful calls

我不确定我这样处理异常是否正确,或者我是否正确地制作了 python 异常符号。同样,我的目标是 我希望它只重试失败的 URL,如果(第二次尝试后)它仍然失败,调用一个固定的 URL 来记录一个错误。

感谢 我找到了答案。

答案是为 urllib2.urlopen 函数创建我自己的自定义包装器,因为每个线程本身都需要 try{}catch 而不是整个线程事物。该函数看起来像这样:

def my_urlopen(url):
    try:
        return urllib2.urlopen(url)
    except URLError:
        urllib2.urlopen("https://example.com/log_error/?url="+url)
        return None

我把它放在 def lambda_handler 函数声明的上方,然后我可以用下面的代码替换其中的整个 try/catch:

try:
   results = pool.map(urllib2.urlopen, urls);
except URLError:
   try:                              # try once more before logging error
      urllib2.urlopen(URLError.url);
   except URLError:                  # log error
      urllib2.urlopen("https://example.com/error/?url="+URLError.url);

为此:

results = pool.map(my_urlopen, urls);

Q.E.D.