芹菜发出的 json 大响应,未到达烧瓶

large json response emitted from celery, not reaching flask

我正在开发一个用户上传文件的 Web 应用程序,对服务器上的文件执行一些计算需要很长时间。计算完成后,应用程序会显示带有原始文件的计算结果。

这适用于小文件,但不适用于大文件。基本上,如果 json 响应很小(检查大小为 3 和 11726 的对象数组),它会发送到浏览器,但不会发送到大响应(大小为 507072)。失败案例中上传的文件大小约为 3 MB,并且 json 响应包含上传文件的所有数据,因此假设它也是相同的顺序。

在芹菜方面

我在小 json 响应(11726 数组大小)

的情况下看到了这一点
[2016-12-20 18:11:11,734: INFO/MainProcess] emitting event "response"     to all [/test]
[2016-12-20 18:11:13,151: INFO/MainProcess] Task  flask_application.restapi.background_thread[0457e7cb-f10b-408e-8f80-2bd95ce2c7b4] succeeded in 10.542778854s: None

同样,在大 json 响应中:

[2016-12-20 18:17:08,788: INFO/MainProcess] emitting event "response" to all [/test]
[2016-12-20 18:18:42,813: INFO/MainProcess] Task flask_application.restapi.background_thread[309e0ed6-94ec-4332-a2ed-b6d9211bc263] succeeded in 366.478926406s: None

在 Flask 方面

对于小的 json 响应,我看到了完整的输出,最终在浏览器上也可见(这不是完整的输出,因为它被截断了):

35,"Formula":"CHNO"},{"NA Corrected with  zero":345,"Name":"XYZ","Label":"A","Sample":"M_6_6","Truth Value":false,"NA Corrected":323,"Formula":"CHNO"},{"NA Corrected with zero":422,"Name":"XYZ","Label":"A","Sample":"M_6_5","Truth Value":false,"NA Corrected":233,"Formula":"CHNO"},{"NA Corrected with zero":494,"Name":"XYZ","Label":"A","Sample":"M_6_4","Truth Value":false,"NA Corrected":4946,"Formula":"CHNO"},{"NA Corrected with zero":458,"Name":"L-Proline","Label":"A","Sample":"M_6_3","Truth Value":false,"NA Corrected":4586,"Formula":"CHNO"},{"NA Corrected with zero":493,"Name":"XYZ","Label":"A","Sample":"M_6_2","Truth Value":false,"NA Corrected":4935,"Formula":"CHNO"},{"NA Corrected with zero":4565,"Name":"XYZ","Label":"A","Sample":"M_6_1","Truth Value":false,"NA Corrected":45656,"Formula":"CHNO"}]}] [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:11PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]

但是对于大反响,什么都没有。似乎烧瓶没有收到芹菜的任何东西:

20161220-18:16PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:16PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]
20161220-18:16PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:17PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]
20161220-18:17PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:17PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]
20161220-18:17PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:18PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]
20161220-18:18PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:18PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]
20161220-18:18PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:19PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]
20161220-18:19PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:19PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]
20161220-18:19PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]

我正在使用 flask、flask-socketio、celery 和 redis。我已经按照 this blog 实现了 web 套接字和长时间的后台任务。这是相关代码。

#socketio_redis
'SOCKETIO_REDIS_URL' = 'redis://localhost:6379/0'

@socketio.on('nacorr', namespace='/test')
def test_connect(data):
    background_thread.delay(app.config['SOCKETIO_REDIS_URL'],data)

@celery.task
def background_thread(url, data):
    local_socketio = SocketIO(message_queue=url)
    local_socketio.emit('response', {'data': 'background task starting ...'}, namespace='/test')
    # do something with data  
    responsejson = some_func_resulting_in_json_obj()
    local_socketio.emit('response', responsejson, namespace='/test')

感谢任何帮助。

更新

所有任务的redis键值都相同(只是任务id不同):

127.0.0.1:6379> GET "celery-task-meta-691c61e6-ef2a-413c-9226-fac0be352928" 
"{\"status\": \"SUCCESS\", \"traceback\": null, \"result\": null,  \"task_id\": \"691c61e6-ef2a-413c-9226-fac0be352928\", \"children\": []}"
127.0.0.1:6379> GET "celery-task-meta-f01e0818-dc80-42e2-8ab0-2e4240f0e972"
"{\"status\": \"SUCCESS\", \"traceback\": null, \"result\": null, \"task_id\": \"f01e0818-dc80-42e2-8ab0-2e4240f0e972\", \"children\": []}"
127.0.0.1:6379> GET "celery-task-meta-5adc0627-3288-4d21-8bf7-6ee5964e1683"
"{\"status\": \"SUCCESS\", \"traceback\": null, \"result\": null, \"task_id\": \"5adc0627-3288-4d21-8bf7-6ee5964e1683\", \"children\": []}"

我能够通过将我的响应分成块并发出较小的块来解决这个问题,所以可能某处存在缓冲区大小问题。

此问题已通过配置 redis pubsub 客户端缓冲区限制和 flask-socketio ping 超时解决。

监控redis和flask

按照 miguel 的建议监控 redis 日志。在 redis-cli 上使用命令 PSUBSCRIBE * 将 redis activity 作为消息队列进行监控。

对于小型数据集,当单击按钮启动长 运行 任务时,redis 出现了这个问题:

ivery_tag\": \"433b78b8-7cb1-4665-96aa-a53fcc478c32\"}, \"content-  encoding\": \"utf-8\"}"
1) "pmessage"
2) "*"
3) "flask-socketio"
4) "(dp0\nS'skip_sid'\np1\nNsS'room'\np2\nNsS'namespace'\np3\nS'/test'\np4\nsS'event'\np5\nS'response'\np6\nsS'callback'\np7\nNsS'data'\np8\n(dp9\ng8\nS'background task starting ...'\np10\nssS'method'\np11\nS'emit'\np12\ns."

ected\np140697\nF45867838.165479094\nsVFormula\np140698\nVC5H9NO2\np140699\nsa(dp140700\nVNA Corrected with zero\np140701\nF49355484.84626437\nsVName\np140702\nVL-Proline\np140703\nsVLabel\np140704\nVC12 PARENT\np140705\nsVSample\np140706\nVSAMPLE_#SZ3G2FM_6_2\np140707\nsVTruth Value\np140708\nI00\nsVNA Corrected\np140709\nF49355484.84626437\nsVFormula\np140710\nVC5H9NO2\np140711\nsa(dp140712\nVNA Corrected with zero\np140713\nF45656465.639370896\nsVName\np140714\nVL-Proline\np140715\nsVLabel\np140716\nVC12 PARENT\np140717\nsVSample\np140718\nVSAMPLE_#SZ3G2FM_6_1\np140719\nsVTruth Value\np140720\nI00\nsVNA Corrected\np140721\nF45656465.639370896\nsVFormula\np140722\nVC5H9NO2\np140723\nsassS'method'\np140724\nS'emit'\np140725\ns."
1) "pmessage"
2) "*"
3) "flask-socketio"
4) "(dp0\nS'skip_sid'\np1\nNsS'room'\np2\nNsS'namespace'\np3\nS'/'\np4\nsS'event'\np5\nS'disconnect'\np6\nsS'callback'\np7\nNsS'data'\np8\n(dp9\ng8\nS'disconnect the socket'\np10\nssS'method'\np11\nS'emit'\np12\ns."
1) "pmessage"
2) "*"
3) "celery-task-meta-6b4da3b9-b887-4032-a0ce-5d19a38f75d3"

正确的是运行 celery 任务并发送所有发出的消息。任务的id是"celery-task-meta-6b4da3b9-b887-4032-a0ce-5d19a38f75d3".

如果同一任务针对生成大响应的大文件运行,则该过程未完成并且 redis-cli 会在一段时间后停止:

3) "flask-socketio"
4) "(dp0\nS'skip_sid'\np1\nNsS'room'\np2\nNsS'namespace'\np3\nS'/test'\np4\nsS'event'\np5\nS'response'\np6\nsS'callback'\np7\nNsS'data'\np8\n(dp9\ng8\nS'background task starting ...'\np10\nssS'method'\np11\nS'emit'\np12\ns."
1) "pmessage"
2) "*"
3) "/0.celeryev/worker.heartbeat"
4) "{\"body\": \"eyJzd19zeXMiOiAiRGFyd2luIiwgImNsb2NrIjogMTY1MCwgInRpbWVzdGFtcCI6IDE0ODI3ODE1OTQuODI5MzY0LCAiaG9zdG5hbWUiOiAiY2VsZXJ5QFJhYWlzYXMtTWFjQm9vay1Qcm8ubG9jYWwiLCAicGlkIjogNDI0MSwgInN3X3ZlciI6ICI0LjAuMiIsICJ1dGNvZmZzZXQiOiAtNiwgImxvYWRhdmciOiBbMi4zNywgMi45MSwgMi45N10sICJwcm9jZXNzZWQiOiA4LCAiYWN0aXZlIjogMSwgImZyZXEiOiAyLjAsICJ0eXBlIjogIndvcmtlci1oZWFydGJlYXQiLCAic3dfaWRlbnQiOiAicHktY2VsZXJ5In0=\", \"headers\": {\"hostname\": \"celery@Raaisas-MacBook-Pro.local\"}, \"content-type\": \"application/json\", \"properties\": {\"priority\": 0, \"body_encoding\": \"base64\", \"delivery_info\": {\"routing_key\": \"worker.heartbeat\", \"exchange\": \"celeryev\"}, \"delivery_mode\": 1, \"delivery_tag\": \"d96e8e9e-032e-41ae-8e9c-64cf7c6bc17c\"}, \"content-encoding\": \"utf-8\"}"
1) "pmessage"
2) "*"
3) "/0.celeryev/worker.heartbeat"
4) "{\"body\": \"eyJzd19zeXMiOiAiRGFyd2luIiwgImNsb2NrIjogMTY1MiwgInRpbWVzdGFtcCI6IDE0ODI3ODE2MDkuODA5MzAyLCAiaG9zdG5hbWUiOiAiY2VsZXJ5QFJhYWlzYXMtTWFjQm9vay1Qcm8ubG9jYWwiLCAicGlkIjogNDI0MSwgInN3X3ZlciI6ICI0LjAuMiIsICJ1dGNvZmZzZXQiOiAtNiwgImxvYWRhdmciOiBbMi4yOSwgMi44NiwgMi45NV0sICJwcm9jZXNzZWQiOiA4LCAiYWN0aXZlIjogMSwgImZyZXEiOiAyLjAsICJ0eXBlIjogIndvcmtlci1oZWFydGJlYXQiLCAic3dfaWRlbnQiOiAicHktY2VsZXJ5In0=\", \"headers\": {\"hostname\": \"celery@Raaisas-MacBook-Pro.local\"}, \"content-type\": \"application/json\", \"properties\": {\"priority\": 0, \"body_encoding\": \"base64\", \"delivery_info\": {\"routing_key\": \"worker.heartbeat\", \"exchange\": \"celeryev\"}, \"delivery_mode\": 1, \"delivery_tag\": \"3160b37f-273b-4023-9bb6-5d89ebfe97a7\"}, \"content-encoding\": \"utf-8\"}"
user-MacBook-Pro:WebTool user$

但是如果在redis上我们设置如下配置: config set client-output-buffer-limit "pubsub 0 0 0",我们在redis中看到了很大的emit消息。现在出现另一个问题,因为 flask websocket 显示错误 "Socket is closed"。这可能是因为发出大响应所花费的时间。增加 SocketIO 中的 ping_timeout 可以解决这个问题,并将响应发送到前端。

将响应分成块:

此问题的另一种解决方案是将响应分成小块并发出。它适用于上述情况。