如何在 Python 中停止一次调度程序迭代

Question

我是 Python（和一般编程）的新手。我有一个 pcode，可以通过 PyCURL 定期下载网站内容并进行一些搜索。我在 While-True 中使用了调度程序来设置一个无限循环，我在其中创建一个对象并调用它的方法 start() 来获取网站并执行一些搜索。当 getbody() 方法无法获取网站时会出现问题，由于连接问题（或其他原因）。 BeautifulSoup 需要字符串，否则会引发错误。

如何在 getbody() 方法中出现 Error/Exception 时停止调度程序的线程并等待另一个线程？作为 getbody() 方法的结果返回空字符串是浪费 cpu 时间。

#Parser_module
class Parser(object):
    def __init__(self):
        self.body = BeautifulSoup(self.getbody(), "lxml")
        self.buffer = BytesIO()

    def getbody(self):
        # some code to set pycurl up
        try:
            c.perform()
        except pycurl.error:
            print("connection error")
            # returns an emptry string to feed the BeautifulSoup with
            return ""
        body = self.buffer.getvalue().decode("utf-8")
        return body

     def start(self):
         #calls other functions to perform some searching
         self.otherfunction()

     def otherfunction(self):
         .
         .
         .

#Scheduler module
import Parser_module
from threading import Timer

def start_search():
    parser = Parser()
    parser.start()
    t = Timer(20.0, start_search)
    t.start()

Answer 1

如果发生错误，您可以在 Parser.start 和 return 中获取 URL，而不是在 Parser.__init__ 中获取。

class Parser(object):
    def __init__(self):
        self.body = None
        self.buffer = BytesIO()

    def start():
        data = self.getbody()
        if not data:
            return
        self.body = BeautifulSoup(data, "lxml")
        self.otherfunction()

    def getbody(self):
        ...

    def otherfunction(self):
        ...

附带说明一下，我建议您使用更好的 requests library instead of pycurl, if you can. Also check out PEP8，Python 风格指南，例如有关如何命名事物的一些建议。

如何在 Python 中停止一次调度程序迭代

How to stop one scheduler iteration in Python

python

scheduler