Cloud 运行中有很多 "Uncaught signal: 6" 个错误

Question

我在 GCP 中部署了一个 Python (3.x) 网络服务。每次云运行关闭实例时，最明显的是在负载高峰之后，我会收到许多像这样的日志 Uncaught signal: 6, pid=6, tid=6, fault_addr=0. 以及 [CRITICAL] WORKER TIMEOUT (pid:6) 它们总是信号 6。

该服务正在 Docker 中使用 FastAPI 和 Gunicorn 运行以及此启动命令

CMD gunicorn -w 2 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8080 app.__main__:app

该服务是使用 Terraform 部署的，有 1 个内存，2 个 cpu，超时设置为 2 分钟

resource "google_cloud_run_service" <ressource-name> {
  name     = <name>
  location = <location>

  template {
    spec {
      service_account_name = <sa-email>
      timeout_seconds = 120
      containers {
        image = var.image
        env {
          name = "GCP_PROJECT"
          value = var.project
        }
        env {
          name = "BRANCH_NAME"
          value = var.branch
        }
        resources {
          limits = {
            cpu = "2000m"
            memory = "1Gi"
          }
        }
      }
    }
  }
  autogenerate_revision_name = true
}

我已经尝试调整 Cloud 运行中的资源和超时，使用 gunicorn 的 --timeout 和 --preload 标志，因为这是人们在谷歌搜索问题时似乎总是推荐的，但都没有成功。我也不太清楚为什么工人会超时。

Answer 1

除非您已启用 CPU 始终分配，否则后台线程和进程可能会在所有 HTTP 请求 return 后停止接收 CPU 时间.这意味着后台线程和进程可能会失败，连接可能会超时等。除了设置 --cpu-no-throttling 标志。可以终止未处理请求的云运行个实例。

信号 6 表示 abort 终止进程。这可能意味着您的容器由于缺少要处理的请求而被终止。

Run more workloads on Cloud Run with new CPU allocation controls

What if my application is doing background work outside of request processing?

Answer 2

扩展正确的最佳答案，您正在使用 GUnicorn，它是一个进程管理器，用于管理 Uvicorn 进程运行是实际的应用程序。

当 Cloud运行想要关闭实例时（可能是因为缺少请求）它会发送一个 信号 6 给进程 1。但是，GUnicorn 占用了这个作为管理者进行处理，不会将其传递给 Uvicorn 工作人员进行处理 - 因此您会收到 未处理信号 6.

最简单的解决方案是直接运行 Uvicorn 而不是通过 GUnicorn（可能使用较小的实例）并允许通过 Cloud运行.

处理缩放部分

CMD ["uvicorn", "app.__main__:app", "--host", "0.0.0.0", "--port", "8080"]

Answer 3

后台进程中止时会发生此错误。与其他应用程序一样，云上的运行后台线程也有一些优势。幸运的是，您仍然可以在 Cloud 运行上使用它们，而不会中止进程。为此，在部署时，选择选项“CPU 始终分配”而不是“CPU 仅在请求处理期间分配”

有关详细信息，请查看 https://cloud.google.com/run/docs/configuring/cpu-allocation

Cloud 运行中有很多 "Uncaught signal: 6" 个错误

Lots of "Uncaught signal: 6" errors in Cloud Run

python-3.x

docker

google-cloud-platform

google-cloud-run

fastapi

Cloud 运行 中有很多 "Uncaught signal: 6" 个错误

Lots of "Uncaught signal: 6" errors in Cloud Run

python-3.x

docker

google-cloud-platform

google-cloud-run

fastapi

Cloud 运行中有很多 "Uncaught signal: 6" 个错误