Lambda Python Pool.map 和 urllib2.urlopen ：仅重试失败的进程，仅记录错误

Question

我有一个 AWS Lambda 函数，它使用 pool.map 调用一组 URLs。问题是，如果 URLs returns 之一不是 200，Lambda 函数将失败并立即重试。问题是它会立即重试整个 lambda 函数。我希望它只重试失败的 URLs，如果（第二次尝试后）它仍然失败，调用固定的 URL 来记录错误。

这是当前的代码（删除了一些细节），仅当所有 URL 为：

时才有效

from __future__ import print_function
import urllib2 
from multiprocessing.dummy import Pool as ThreadPool 

import hashlib
import datetime
import json

print('Loading function')

def lambda_handler(event, context):

  f = urllib2.urlopen("https://example.com/geturls/?action=something");
  data = json.loads(f.read());

  urls = [];
  for d in data:
      urls.append("https://"+d+".example.com/path/to/action");

  # Make the Pool of workers
  pool = ThreadPool(4);

  # Open the urls in their own threads
  # and return the results
  results = pool.map(urllib2.urlopen, urls);

  #close the pool and wait for the work to finish 
  pool.close();
  return pool.join();

我尝试阅读 the official documentation，但它似乎缺乏对 map 函数的解释，特别是对 return 值的解释。

使用 urlopen 文档，我尝试将我的代码修改为以下内容：

from __future__ import print_function
import urllib2 
from multiprocessing.dummy import Pool as ThreadPool 

import hashlib
import datetime
import json

print('Loading function')

def lambda_handler(event, context):

  f = urllib2.urlopen("https://example.com/geturls/?action=something");
  data = json.loads(f.read());

  urls = [];
  for d in data:
      urls.append("https://"+d+".example.com/path/to/action");

  # Make the Pool of workers
  pool = ThreadPool(4);

  # Open the urls in their own threads
  # and return the results
  try:
     results = pool.map(urllib2.urlopen, urls);
  except URLError:
     try:                              # try once more before logging error
        urllib2.urlopen(URLError.url); # TODO: figure out which URL errored
     except URLError:                  # log error
        urllib2.urlopen("https://example.com/error/?url="+URLError.url);

  #close the pool and wait for the work to finish 
  pool.close();
  return true; # always return true so we never duplicate successful calls

我不确定我这样处理异常是否正确，或者我是否正确地制作了 python 异常符号。同样，我的目标是 我希望它只重试失败的 URL，如果（第二次尝试后）它仍然失败，调用一个固定的 URL 来记录一个错误。

Answer 1

感谢我找到了答案。

答案是为 urllib2.urlopen 函数创建我自己的自定义包装器，因为每个线程本身都需要 try{}catch 而不是整个线程事物。该函数看起来像这样：

def my_urlopen(url):
    try:
        return urllib2.urlopen(url)
    except URLError:
        urllib2.urlopen("https://example.com/log_error/?url="+url)
        return None

我把它放在 def lambda_handler 函数声明的上方，然后我可以用下面的代码替换其中的整个 try/catch：

try:
   results = pool.map(urllib2.urlopen, urls);
except URLError:
   try:                              # try once more before logging error
      urllib2.urlopen(URLError.url);
   except URLError:                  # log error
      urllib2.urlopen("https://example.com/error/?url="+URLError.url);

为此：

results = pool.map(my_urlopen, urls);

Q.E.D.

Lambda Python Pool.map 和 urllib2.urlopen ：仅重试失败的进程，仅记录错误

Lambda Python Pool.map and urllib2.urlopen : Retry only failing processes, log only errors

python

lambda

urllib2