在 CentOS 8 上使用 crontab 时是否有 HTTP 请求的页面缓存?

Is there a page cache for HTTP requests when using crontab on CentOS 8?

问题与输出

在使用我的 python 脚本时,我从返回 json 的 API 调用中得到了相同的响应,并且它似乎同时发生了几个小时。

我正在使用 Coindesk BPI API,它每分钟更新一次。因此,正如我们所知,比特币的价格不会在 5 小时内保持水平。请参阅下面的输出示例:

    # results.txt
    {"timestamp": 16-Apr-2020 22:50, "price": 7078, "gCount": 28, "rCount": 48}
    {"timestamp": 16-Apr-2020 23:00, "price": 7085, "gCount": 29, "rCount": 50}
    {"timestamp": 16-Apr-2020 23:10, "price": 7011, "gCount": 33, "rCount": 52}
    {"timestamp": 16-Apr-2020 23:20, "price": 7002, "gCount": 31, "rCount": 55}
    {"timestamp": 16-Apr-2020 23:30, "price": 7020, "gCount": 30, "rCount": 52}
    {"timestamp": 16-Apr-2020 23:40, "price": 7027, "gCount": 33, "rCount": 54}
    {"timestamp": 16-Apr-2020 23:50, "price": 7047, "gCount": 35, "rCount": 58}
    {"timestamp": 17-Apr-2020 00:01, "price": 7060, "gCount": 36, "rCount": 57}
    {"timestamp": 17-Apr-2020 00:10, "price": 7051, "gCount": 34, "rCount": 45}
    {"timestamp": 17-Apr-2020 00:20, "price": 7052, "gCount": 41, "rCount": 48}
    {"timestamp": 17-Apr-2020 00:31, "price": 7054, "gCount": 47, "rCount": 48}
    # It worked! Now the price is stuck for 2 get requests.
    {"timestamp": 17-Apr-2020 00:40, "price": 7054, "gCount": 48, "rCount": 47}
    {"timestamp": 17-Apr-2020 00:50, "price": 7054, "gCount": 50, "rCount": 48}
    {"timestamp": 17-Apr-2020 01:01, "price": 7051, "gCount": 48, "rCount": 43}
    # Price stuck again for around 30 get requests.
    {"timestamp": 17-Apr-2020 01:10, "price": 7051, "gCount": 46, "rCount": 47}
    {"timestamp": 17-Apr-2020 01:20, "price": 7051, "gCount": 49, "rCount": 46}
    {"timestamp": 17-Apr-2020 01:30, "price": 7051, "gCount": 48, "rCount": 47}
    {"timestamp": 17-Apr-2020 01:40, "price": 7051, "gCount": 50, "rCount": 48}
    {"timestamp": 17-Apr-2020 01:50, "price": 7051, "gCount": 50, "rCount": 52}
    {"timestamp": 17-Apr-2020 02:00, "price": 7051, "gCount": 51, "rCount": 56}
    {"timestamp": 17-Apr-2020 02:10, "price": 7051, "gCount": 50, "rCount": 55}
    {"timestamp": 17-Apr-2020 02:20, "price": 7051, "gCount": 57, "rCount": 57}
    {"timestamp": 17-Apr-2020 02:30, "price": 7051, "gCount": 48, "rCount": 54}
    {"timestamp": 17-Apr-2020 02:40, "price": 7051, "gCount": 52, "rCount": 54}
    {"timestamp": 17-Apr-2020 02:51, "price": 7051, "gCount": 54, "rCount": 57}
    {"timestamp": 17-Apr-2020 03:00, "price": 7051, "gCount": 53, "rCount": 59}
    {"timestamp": 17-Apr-2020 03:11, "price": 7051, "gCount": 53, "rCount": 59}
    {"timestamp": 17-Apr-2020 03:21, "price": 7051, "gCount": 50, "rCount": 55}
    {"timestamp": 17-Apr-2020 03:31, "price": 7051, "gCount": 51, "rCount": 55}
    {"timestamp": 17-Apr-2020 03:41, "price": 7051, "gCount": 52, "rCount": 56}
    {"timestamp": 17-Apr-2020 03:51, "price": 7051, "gCount": 50, "rCount": 55}
    {"timestamp": 17-Apr-2020 04:01, "price": 7051, "gCount": 48, "rCount": 56}
    {"timestamp": 17-Apr-2020 04:10, "price": 7051, "gCount": 39, "rCount": 50}
    {"timestamp": 17-Apr-2020 04:20, "price": 7051, "gCount": 39, "rCount": 49}
    {"timestamp": 17-Apr-2020 04:31, "price": 7051, "gCount": 41, "rCount": 53}
    {"timestamp": 17-Apr-2020 04:40, "price": 7051, "gCount": 43, "rCount": 53}
    {"timestamp": 17-Apr-2020 04:50, "price": 7051, "gCount": 39, "rCount": 51}
    {"timestamp": 17-Apr-2020 05:00, "price": 7051, "gCount": 37, "rCount": 52}
    {"timestamp": 17-Apr-2020 05:11, "price": 7051, "gCount": 38, "rCount": 54}
    {"timestamp": 17-Apr-2020 05:20, "price": 7051, "gCount": 31, "rCount": 49}
    {"timestamp": 17-Apr-2020 05:30, "price": 7051, "gCount": 0, "rCount": 0}
    {"timestamp": 17-Apr-2020 05:41, "price": 7051, "gCount": 32, "rCount": 49}
    {"timestamp": 17-Apr-2020 05:50, "price": 7051, "gCount": 37, "rCount": 49}
    {"timestamp": 17-Apr-2020 06:01, "price": 7051, "gCount": 39, "rCount": 51}
    {"timestamp": 17-Apr-2020 06:11, "price": 7051, "gCount": 41, "rCount": 47}
    {"timestamp": 17-Apr-2020 06:21, "price": 7051, "gCount": 42, "rCount": 46}
    # Now it works again as intended.
    {"timestamp": 17-Apr-2020 06:31, "price": 7082, "gCount": 45, "rCount": 49}
    {"timestamp": 17-Apr-2020 06:40, "price": 7084, "gCount": 48, "rCount": 50}
    {"timestamp": 17-Apr-2020 06:51, "price": 7095, "gCount": 45, "rCount": 51}
    {"timestamp": 17-Apr-2020 07:01, "price": 7097, "gCount": 44, "rCount": 45}
    {"timestamp": 17-Apr-2020 07:11, "price": 7068, "gCount": 45, "rCount": 46}
    {"timestamp": 17-Apr-2020 07:21, "price": 7070, "gCount": 43, "rCount": 45}

Python 脚本和我尝试过的

我正在使用 python 2.7 和请求。默认情况下,请求不缓存。所以我认为连接只是保持 运行domly 并且 python 重用它,得到相同的 json.

我试图通过设置 keep alive to false, by using the with block and by trying requests.session().close() 来关闭请求会话。下面找到相关的 python 代码:

import requests, json, sys, time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry


def request_json():
    print 'Begin request to get the json...'

    # Try get request once
    response = requests_retry_session().get('https://api.coindesk.com/v1/bpi/currentprice/USD.json')
    if (response.status_code == 200):
        # Close the connection 
        # requests.session().close() <-- tried, doesn't do the trick
        print 'Fetched price succesfully.\n'
        return response.json()

    # If first request didn't succeed, retry 3 times using session 
    with requests.Session() as s:
        s.get('https://api.coindesk.com/v1/bpi/currentprice/USD.json')
        # Close the connection
        # s.config['keep_alive'] = False <-- tried, doesn't do the trick
        response = requests_retry_session(session=s).get(
            'https://api.coindesk.com/v1/bpi/currentprice/USD.json'
        )

    # When requests succeed using session
    if (response.status_code == 200):
        # Close the connection 
        # requests.session().close() <-- tried, doesn't do the trick
        print 'Fetched price succesfully.\n'
        return response.json()

    print 'Couldn\'t fetch price json.'
    return 'error'


def requests_retry_session(
    retries=3,
    backoff_factor=0.3,
    status_forcelist=(500, 502, 504),
    session=None,
):

    session = session or requests.Session()
    retry = Retry(
        total=retries,
        read=retries,
        connect=retries,
        backoff_factor=backoff_factor,
        status_forcelist=status_forcelist,
    )

    adapter = HTTPAdapter(max_retries=retry)
    session.mount('https://', adapter)

    return session




def get_price_data(json):

    price = str(json['bpi']['USD']['rate'])
    # Strip the ',' from price, convert to float and to int
    price = int(float(price.replace(',', '')))

    return price




def main():
    # Send a request for the bitcoin price json
    priceJson = request_json()
    # Check if the request and retries failed
    if (json == 'error'):
        print 'Terminating bitcoinPrice.py script.'
        sys.exit()

    # Get the data from the response json
    priceInt = get_price_data(priceJson)

    # Get timestamp as milliseconds
    milli_sec = int(round(time.time() * 1000))

    # Read the colordata from colors.txt
    # The format is: '63,61' where greenCount,redCount
    fh = open('colors.txt', 'r')
    colorData = fh.read()
    gCount = colorData.split(',')[0]
    rCount = colorData.split(',')[1]

    # Create a string in json format with the price and color data
    dataString = "{\"timestamp\": \"%d\", \"price\": \"%d\", \"gCount\": \"%s\", \"rCount\": \"%s\"}" % (milli_sec, priceInt, gCount, rCount)
    print dataString

    # Read and write to results.txt
    fh = open('results/results.txt', 'a')
    fh.write(dataString + '\n')
    fh.close()
    print '\nSuccesfully saved BTC price and color data to results.txt'




if __name__ == '__main__':
    main()

作为普通用户,我无法每分钟通过 运行ning crontab 来重现该错误,只能重现此 bitcoinPrice.py 脚本。

我的根 crontab 运行ning 每 10 分钟出现一次错误,而在此之前还有几个其他脚本 运行。实际的 crontab,由 root 用户 运行,简化了其他脚本链接如下:

*/10 * * * * node script1.js && python2 script2.py && python2 bitcoinPrice.py && /home/user/clearcache.sh

所有其他脚本都按预期工作。最后一个脚本 clearcache.sh 按照以下方式重置缓存和缓冲区

#!/bin/sh
sync; echo 3 > /proc/sys/vm/drop_caches

我想了解这个错误是怎么回事。如果我找不到解决方案,我将转而使用 curl 并将 API json 响应转储到一个文件中并从那里读取它。任何想法表示赞赏!

我设法解决了。定期使用 curl 仍然存在相同的问题,但我使用了这个答案 () 技巧并为每个请求添加了一个唯一的查询参数(以毫秒为单位的纪元时间)?$(date +%s)。

curl https://api.coindesk.com/v1/bpi/currentprice/USD.json?$(date +%s) -o results/priceJson.txt

...并且它可以在没有任何缓存的情况下工作。现在也可以使用相同的技巧处理 python 请求。