PyCurl 请求在执行时无限挂起

PyCurl request hangs infinitely on perform

我编写了一个脚本来每周 运行 从 Qualys 获取扫描结果以收集指标。

该脚本的第一部分涉及为过去一周 运行 的每个扫描获取参考列表以进行进一步处理。

问题是,虽然这有时会完美运行,但有时脚本会挂在 c.perform() 行。这在手动 运行 脚本时是可以管理的,因为它可以重新 运行 直到它工作。但是,我希望 运行 每周将其作为计划任务,无需任何手动交互。

有没有万无一失的方法可以检测是否发生挂起并重新发送 PyCurl 请求直到它起作用?

我试过设置 c.TIMEOUTc.CONNECTTIMEOUT 选项,但这些似乎都没有效果。另外,由于没有抛出异常,简单地把它放在一个 try-except 块中也不会成功。

有问题的函数如下:

# Retrieve a list of all scans conducted in the past week
# Save this to refs_raw.txt
def getScanRefs(usr, pwd):

    print("getting scan references...")

    with open('refs_raw.txt','wb') as refsraw: 
        today = DT.date.today()
        week_ago = today - DT.timedelta(days=7)
        strtoday = str(today)
        strweek_ago = str(week_ago)

        c = pycurl.Curl()

        c.setopt(c.URL, 'https://qualysapi.qualys.eu/api/2.0/fo/scan/?action=list&launched_after_datetime=' + strweek_ago + '&launched_before_datetime=' + strtoday)
        c.setopt(c.HTTPHEADER, ['X-Requested-With: pycurl', 'Content-Type: text/xml'])
        c.setopt(c.USERPWD, usr + ':' + pwd)
        c.setopt(c.POST, 1)
        c.setopt(c.PROXY, 'companyproxy.net:8080')
        c.setopt(c.CAINFO, certifi.where())
        c.setopt(c.SSL_VERIFYPEER, 0)
        c.setopt(c.SSL_VERIFYHOST, 0)
        c.setopt(c.CONNECTTIMEOUT, 3)
        c.setopt(c.TIMEOUT, 3)

        refsbuffer = BytesIO()
        c.setopt(c.WRITEDATA, refsbuffer)
        c.perform()

        body = refsbuffer.getvalue()
        refsraw.write(body)
        c.close()

    print("Got em!")

我自己解决了这个问题,方法是使用 multiprocessing 启动一个单独的进程,在一个单独的进程中启动 API 调用,如果持续时间超过 5 秒,则终止并重新启动。它不是很漂亮,但是是跨平台的。对于那些寻找更优雅但 仅适用于 *nix 的解决方案的人,请查看 the signal library,特别是 SIGALRM。

代码如下:

# As this request for scan references sometimes hangs it will be run in a separate thread here
# This will be terminated and relaunched if no response is received within 5 seconds
def performRequest(usr, pwd):
    today = DT.date.today()
    week_ago = today - DT.timedelta(days=7)
    strtoday = str(today)
    strweek_ago = str(week_ago)

    c = pycurl.Curl()

    c.setopt(c.URL, 'https://qualysapi.qualys.eu/api/2.0/fo/scan/?action=list&launched_after_datetime=' + strweek_ago + '&launched_before_datetime=' + strtoday)
    c.setopt(c.HTTPHEADER, ['X-Requested-With: pycurl', 'Content-Type: text/xml'])
    c.setopt(c.USERPWD, usr + ':' + pwd)
    c.setopt(c.POST, 1)
    c.setopt(c.PROXY, 'companyproxy.net:8080')
    c.setopt(c.CAINFO, certifi.where())
    c.setopt(c.SSL_VERIFYPEER, 0)
    c.setopt(c.SSL_VERIFYHOST, 0)

    refsBuffer = BytesIO()
    c.setopt(c.WRITEDATA, refsBuffer)
    c.perform()
    c.close()
    body = refsBuffer.getvalue()
    refsraw = open('refs_raw.txt', 'wb')
    refsraw.write(body)
    refsraw.close()

# Retrieve a list of all scans conducted in the past week
# Save this to refs_raw.txt
def getScanRefs(usr, pwd):

    print("Getting scan references...") 

    # Occasionally the request will hang infinitely. Launch in separate method and retry if no response in 5 seconds
    success = False
    while success != True:
        sendRequest = multiprocessing.Process(target=performRequest, args=(usr, pwd))
        sendRequest.start()

        for seconds in range(5):
            print("...")
            time.sleep(1)

        if sendRequest.is_alive():
            print("Maximum allocated time reached... Resending request")
            sendRequest.terminate()
            del sendRequest
        else:
            success = True

    print("Got em!")

这个问题很旧,但我会添加这个答案,它可能会对某人有所帮助。

执行 "perform()" 后终止 运行 curl 的唯一方法是使用回调:

1- 使用 CURLOPT_WRITEFUNCTION: 如文档所述:

Your callback should return the number of bytes actually taken care of. If that amount differs from the amount passed to your callback function, it'll signal an error condition to the library. This will cause the transfer to get aborted and the libcurl function used will return CURLE_WRITE_ERROR.

这种方法的缺点是 curl 仅在从服务器接收到新数据时才调用写入函数,因此如果服务器停止发送数据,curl 将继续在服务器端等待并且不会收到您的 kill 信号

2- 目前最好的替代方法是使用进度回调:

进度回调的美妙之处在于即使没有来自服务器的数据,curl 也会每秒至少调用一次,这将使您有机会 return 0 作为 curl[= 的终止开关14=]

使用选项 CURLOPT_XFERINFOFUNCTION, 请注意,它比使用文档中引用的 CURLOPT_PROGRESSFUNCTION 更好:

We encourage users to use the newer CURLOPT_XFERINFOFUNCTION instead, if you can.

你还需要设置选项CURLOPT_NOPROGRESS

CURLOPT_NOPROGRESS must be set to 0 to make this function actually get called.

这是一个向您展示 python:

中写入和进度函数实现的示例
# example of using write and progress function to terminate curl
import pycurl

open('mynewfile', 'w') as f  # used to save downloaded data
counter = 0

# define callback functions which will be used by curl
def my_write_func(data):
    """write to file"""
    f.write(data)
    counter += len(data)

    # an example to terminate curl: tell curl to abort if the downloaded data exceeded 1024 byte by returning -1 or any number 
    # not equal to len(data) 
    if counter >= 1024:
        return -1

def progress(*data):
    """it receive progress figures from curl"""
    d_size, downloaded, u_size, uploade = data

    # an example to terminate curl: tell curl to abort if the downloaded data exceeded 1024 byte by returning 0 
    if downloaded >= 1024:
        return 0


# initialize curl object and options
c = pycurl.Curl()

# callback options
c.setopt(pycurl.WRITEFUNCTION, my_write_func)

self.c.setopt(pycurl.NOPROGRESS, 0)  # required to use a progress function
self.c.setopt(pycurl.XFERINFOFUNCTION, self.progress) 
# self.c.setopt(pycurl.PROGRESSFUNCTION, self.progress)  # you can use this option but pycurl.XFERINFOFUNCTION is recommended
# put other curl options as required

# executing curl
c.perform()