gae flexible 上的长运行云任务提前终止，没有错误。如何调试？我错过了什么？

Long running cloud task on gae flexible terminates early without error. How to debug? What am I missing?

我正在运行使用 python 和 flask 在 gae flexible 上创建一个应用程序。我定期使用 cron 作业调度云任务。这些基本上遍历所有用户并执行一些聚类分析。任务终止而不会抛出任何类型的错误，但不会执行所有工作（意味着并非所有用户都循环通过）。它似乎不会在 276.5s - 323.3s 的一致时间发生，也不会在同一用户处停止。有没有人经历过类似的事情？

我的猜测是我在某处违反了某种类型的资源限制或超时。我想过或尝试过的事情：

云任务应该被允许运行最多一个小时（按照这个：https://cloud.google.com/tasks/docs/creating-appengine-handlers）
我将 gunicorn worker 的超时时间增加到 3600 以反映这一点。
我有几个工人运行ning。
我试图找出是否存在内存峰值或 cpu 过载，但没有发现任何可疑之处。

抱歉，如果我太含糊或完全没有抓住要点，我对这个问题很困惑。谢谢指点。

除了使用 Cloud Scheduler，您还可以检查日志以确保任务运行正确并确保没有 deadline issues. As application logs are grouped, and after the task itself is executed, it’s sent to Stackdriver. When a task is forcibly terminated, no log may be output. Try catching the Deadline exception 以便输出一些日志，您可能会看到一些有用的信息开始故障排除。

感谢您提出的所有建议，尽管我无意中阅读了 firestore 文档，但我仔细研究了它们并找到了根本原因。我没有迹象表明这与 firestore 有任何关系。

从这里开始：https://googleapis.dev/python/firestore/latest/collection.html 我发现 Query.stream()（或 Query.get()）在单个文档上有超时，如下所示：

Note: The underlying stream of responses will time out after the max_rpc_timeout_millis value set in the GAPIC client configuration for the RunQuery API. Snapshots not consumed from the iterator before that point will be lost.

所以最终超时的是所有用户的查询，我偶然遇到了这个问题，none 我发现的错误让我回到了查询。希望这对以后的人有帮助！

gae flexible 上的长运行云任务提前终止，没有错误。如何调试？我错过了什么？

Long running cloud task on gae flexible terminates early without error. How to debug? What am I missing?

python

google-app-engine

gunicorn

google-cloud-tasks

gae flexible 上的长 运行 云任务提前终止，没有错误。如何调试？我错过了什么？

Long running cloud task on gae flexible terminates early without error. How to debug? What am I missing?

python

google-app-engine

gunicorn

google-cloud-tasks

gae flexible 上的长运行云任务提前终止，没有错误。如何调试？我错过了什么？