为什么请求在 python 中的某个时间后停止?

why request stop after a certain time in python?

我有这段代码,它的功能是发送一个Jetta类型的请求,从请求中获取文本,网站链接是从文本文件中读取的,问题是在发送300或500个请求后,脚本没有显示任何错误就停止了,只是停止工作了??

import requests

sites = open(r'site.txt', 'r', encoding="utf8").readlines()

l_site = []

for i in sites:
    l_site.append(i)


for x in len(l_site):
    result = requests.get(f'{site}', allow_redirects=True).text
    open('result.txt', 'a').write(f'{result}\n')

我想你的函数比你在这里放的要多,因为我看不到 site 变量是在哪里创建的。

您可以按照这些思路做一些事情,以更好地了解它停止的位置。

import requests

sites = open(r'site.txt', 'r', encoding="utf8").readlines()

l_site = [s for s in sites]

with open('result.txt', 'a') as fb:

    for site in l_site:
        try:
            print(f"Processing {site}")
            result = requests.get(f'{site}', allow_redirects=True).text
            fb.write(f'{result}\n')
        except Exception as e:
            raise e

如果我没理解错的话,这就是你想要的:

  1. 阅读自site.txt
  2. 如果 http 请求成功,将响应负载附加到 result.txt
  3. 如果 http 请求由于超时而失败,将带有 url 的结果附加到另一个文件

这是一段运行的代码。请注意,如果您想捕获更多类型的错误,可以更改 except 部分。

import requests

URLS_FILE = 'site.txt'
RESULT_FILE = 'result.txt'
ERRORS_FILE = 'result-error.txt'

def handle_url(url: str, result_file, error_file): 
    try:
        # 10 seconds timeout, not download time, but time to get an HTTP response
        content = requests.get(url, allow_redirects=True, timeout=10)
        result_file.write(f'{content.text}\n')
    except requests.exceptions.ConnectTimeout as e:
        error_file.write(f'{url}: {e}\n')




with open(URLS_FILE, 'r', encoding="utf8") as f:
    with open(RESULT_FILE, 'a') as rf:
        with open(ERRORS_FILE, 'a') as ef:
            for url in f.readlines():
                handle_url(url, rf, ef)