在 try 块中的失败点获取索引以在 except 块中打印

Getting index at point of failure in try block to print in except block

我有一个 scraper,它遍历一大堆邮政编码,并将它们插入到运行 scraper 的 for 循环中。 for 循环在 try 块内。有时连接会中断,在这种情况下,我希望爬虫再试一次,总共尝试 3 次。问题是我在代码运行时将数据写入数据库,所以我不希望它在重试时从邮政编码列表的开头开始,而是在邮政编码列表中的那个点开始连接中断。我认为这自然需要从 try 块中提取 for 循环的邮政编码索引并将其插入到 except 块中,但我在这方面没有成功。部分问题是我无法弄清楚如何故意破坏 try 块以使邮政编码位置脱离 try 并进入 except:

这是我目前的测试结果:

import time
postal_code = ['1','2','3','4','5']
territories_to_scrape = {'territory_name': ['unknown', 'texas', 'louisiana'],
                         'postal_code':['1','2','3']}
def test(postal_code, territories_to_scrape):
    tic = time.perf_counter()
    loop = True
    while loop:
        trycnt = 3
        while (trycnt > 0):
            try:
                for p in postal_code:
                    if p:
                        raise ValueError(f'arg is True {p}')
                    return p
                        
                trycnt = 0
                toc = time.perf_counter()
                print(f"Took {toc - tic:0.4f} seconds or {(toc - tic)/60} minutes, for run to complete for {len(postal_code)} postal_codes in try")
                return p
            except Exception as e:
                if trycnt <= 0: 
                    print("Failed to retrieve: in if of except") # done retrying
                    loop = False
                else: trycnt -= 1  # retry
                time.sleep(0.5)  # wait 1/2 second then retry
                toc = time.perf_counter()
                print(f"Took {toc - tic:0.4f} seconds or {(toc - tic)/60} minutes, for run to complete for {len(postal_code)} postal_codes in else")
                last_postal_code = p
                ind = territories_to_scrape[territories_to_scrape['postal_code'] == last_postal_code].index.values[0]
                territories_to_scrape = territories_to_scrape.loc[ind:]
                print(territories_to_scrape)
test(postal_code, territories_to_scrape)

谢谢!

我认为你几乎完美地掌握了它。您只需在每次尝试时从 p 变量开始 for 循环。请参阅下面的简化框架代码:

postal_codes_to_parse = ['a', 'b', 'c', 'd', 'e', 'f']
try_count = 0
max_tries = 3
for_loop_starting = 0
while try_count < max_tries:
    try:
        for index in range(for_loop_starting, len(postal_codes_to_parse)):
            p = postal_codes_to_parse[index]
            # do something that might throw an error
    except:
        for_loop_starting = index
        try_count += 1

或者,您可以将 try_error 块放在单个 while 循环中,而不必担心有单独的 for 循环

postal_codes_to_parse = ['a', 'b', 'c', 'd', 'e', 'f']
try_count = 0
max_tries = 3
index = 0
while (try_count < max_tries) and (index < len(postal_codes_to_parse):
    p = postal_codes_to_parse[index]
    try:
        # do something that might cause an error
        index += 1
    except:
        try_count += 1