在 try 块中的失败点获取索引以在 except 块中打印
Getting index at point of failure in try block to print in except block
我有一个 scraper,它遍历一大堆邮政编码,并将它们插入到运行 scraper 的 for 循环中。 for 循环在 try 块内。有时连接会中断,在这种情况下,我希望爬虫再试一次,总共尝试 3 次。问题是我在代码运行时将数据写入数据库,所以我不希望它在重试时从邮政编码列表的开头开始,而是在邮政编码列表中的那个点开始连接中断。我认为这自然需要从 try 块中提取 for 循环的邮政编码索引并将其插入到 except 块中,但我在这方面没有成功。部分问题是我无法弄清楚如何故意破坏 try 块以使邮政编码位置脱离 try 并进入 except:
这是我目前的测试结果:
import time
postal_code = ['1','2','3','4','5']
territories_to_scrape = {'territory_name': ['unknown', 'texas', 'louisiana'],
'postal_code':['1','2','3']}
def test(postal_code, territories_to_scrape):
tic = time.perf_counter()
loop = True
while loop:
trycnt = 3
while (trycnt > 0):
try:
for p in postal_code:
if p:
raise ValueError(f'arg is True {p}')
return p
trycnt = 0
toc = time.perf_counter()
print(f"Took {toc - tic:0.4f} seconds or {(toc - tic)/60} minutes, for run to complete for {len(postal_code)} postal_codes in try")
return p
except Exception as e:
if trycnt <= 0:
print("Failed to retrieve: in if of except") # done retrying
loop = False
else: trycnt -= 1 # retry
time.sleep(0.5) # wait 1/2 second then retry
toc = time.perf_counter()
print(f"Took {toc - tic:0.4f} seconds or {(toc - tic)/60} minutes, for run to complete for {len(postal_code)} postal_codes in else")
last_postal_code = p
ind = territories_to_scrape[territories_to_scrape['postal_code'] == last_postal_code].index.values[0]
territories_to_scrape = territories_to_scrape.loc[ind:]
print(territories_to_scrape)
test(postal_code, territories_to_scrape)
谢谢!
我认为你几乎完美地掌握了它。您只需在每次尝试时从 p
变量开始 for 循环。请参阅下面的简化框架代码:
postal_codes_to_parse = ['a', 'b', 'c', 'd', 'e', 'f']
try_count = 0
max_tries = 3
for_loop_starting = 0
while try_count < max_tries:
try:
for index in range(for_loop_starting, len(postal_codes_to_parse)):
p = postal_codes_to_parse[index]
# do something that might throw an error
except:
for_loop_starting = index
try_count += 1
或者,您可以将 try_error 块放在单个 while 循环中,而不必担心有单独的 for 循环
postal_codes_to_parse = ['a', 'b', 'c', 'd', 'e', 'f']
try_count = 0
max_tries = 3
index = 0
while (try_count < max_tries) and (index < len(postal_codes_to_parse):
p = postal_codes_to_parse[index]
try:
# do something that might cause an error
index += 1
except:
try_count += 1
我有一个 scraper,它遍历一大堆邮政编码,并将它们插入到运行 scraper 的 for 循环中。 for 循环在 try 块内。有时连接会中断,在这种情况下,我希望爬虫再试一次,总共尝试 3 次。问题是我在代码运行时将数据写入数据库,所以我不希望它在重试时从邮政编码列表的开头开始,而是在邮政编码列表中的那个点开始连接中断。我认为这自然需要从 try 块中提取 for 循环的邮政编码索引并将其插入到 except 块中,但我在这方面没有成功。部分问题是我无法弄清楚如何故意破坏 try 块以使邮政编码位置脱离 try 并进入 except:
这是我目前的测试结果:
import time
postal_code = ['1','2','3','4','5']
territories_to_scrape = {'territory_name': ['unknown', 'texas', 'louisiana'],
'postal_code':['1','2','3']}
def test(postal_code, territories_to_scrape):
tic = time.perf_counter()
loop = True
while loop:
trycnt = 3
while (trycnt > 0):
try:
for p in postal_code:
if p:
raise ValueError(f'arg is True {p}')
return p
trycnt = 0
toc = time.perf_counter()
print(f"Took {toc - tic:0.4f} seconds or {(toc - tic)/60} minutes, for run to complete for {len(postal_code)} postal_codes in try")
return p
except Exception as e:
if trycnt <= 0:
print("Failed to retrieve: in if of except") # done retrying
loop = False
else: trycnt -= 1 # retry
time.sleep(0.5) # wait 1/2 second then retry
toc = time.perf_counter()
print(f"Took {toc - tic:0.4f} seconds or {(toc - tic)/60} minutes, for run to complete for {len(postal_code)} postal_codes in else")
last_postal_code = p
ind = territories_to_scrape[territories_to_scrape['postal_code'] == last_postal_code].index.values[0]
territories_to_scrape = territories_to_scrape.loc[ind:]
print(territories_to_scrape)
test(postal_code, territories_to_scrape)
谢谢!
我认为你几乎完美地掌握了它。您只需在每次尝试时从 p
变量开始 for 循环。请参阅下面的简化框架代码:
postal_codes_to_parse = ['a', 'b', 'c', 'd', 'e', 'f']
try_count = 0
max_tries = 3
for_loop_starting = 0
while try_count < max_tries:
try:
for index in range(for_loop_starting, len(postal_codes_to_parse)):
p = postal_codes_to_parse[index]
# do something that might throw an error
except:
for_loop_starting = index
try_count += 1
或者,您可以将 try_error 块放在单个 while 循环中,而不必担心有单独的 for 循环
postal_codes_to_parse = ['a', 'b', 'c', 'd', 'e', 'f']
try_count = 0
max_tries = 3
index = 0
while (try_count < max_tries) and (index < len(postal_codes_to_parse):
p = postal_codes_to_parse[index]
try:
# do something that might cause an error
index += 1
except:
try_count += 1