Return 如果函数超时,则为空列表而不是 TimeoutError

Return empty list instead of TimeoutError if function times out

我有一个函数,当前使用 @timeout_decorator 装饰器使函数在几秒后超时。但是,我不想让函数超时,而是 return 一个空列表 []。如果需要,我愿意使用其他 packages/decorators,前提是它们可以通过 PyPi 轻松安装。这是我当前的代码:

import timeout_decorator
import requests
from bs4 import BeautifulSoup as bs

@timeout_decorator.timeout(5, use_signals=False)
def get_soup(url):
    session = requests.Session()
    # set the User-agent as a regular browser
    session.headers["User-Agent"] = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36"

    # get the HTML content
    html = session.get(url).content

    # parse HTML using beautiful soup
    soup = bs(html, "html.parser")
    return soup

注意:如果您需要超时的 URL 示例,我使用 https://www.ebay.com

你可以在它周围放置另一个装饰器来处理捕获超时异常和 returns 一个空列表。如果你不想自己写,你可能想使用来自 funcy https://funcy.readthedocs.io/en/stable/flow.html 的@ignore。假设你得到一个 TimeoutError 异常,你应该看起来像

import timeout_decorator
import requests
from bs4 import BeautifulSoup as bs

@ignore(TimeoutError, default=[])
@timeout_decorator.timeout(5, use_signals=False)
def get_soup(url):
    session = requests.Session()
    # set the User-agent as a regular browser
    session.headers["User-Agent"] = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36"

    # get the HTML content
    html = session.get(url).content

    # parse HTML using beautiful soup
    soup = bs(html, "html.parser")
    return soup

忽略会是这样的:

import timeout_decorator

def ignore(func, *args, **kwargs ):
    def main( *args, **kwargs ):
        try:
            return func( *args, **kwargs )
        except TimeoutError as e:
            print( e )
            return []
    return main

@return_lister
@timeout_decorator.timeout(5)
def function_that_times_out():
    input()

function_that_times_out()