Google App Engine 在基本缩放时抛出错误

Question

我正在为该项目使用 golang 和 Google App Engine。我有一个任务，我收到一个巨大的文件，将它分成几行，然后将这些行一行一行地发送到队列中以待解决。我在 app.yaml 文件中缩放的初始设置如下：

instance_class: F1
automatic_scaling:
  min_instances: 0
  max_instances: 4
  min_idle_instances: 0    
  max_idle_instances: 1
  target_cpu_utilization: 0.8
  min_pending_latency: 15s

它工作正常，但它有一个问题 - 因为确实有很多任务，10 分钟后它会失败（当然，根据文档）。所以我决定使用 B1 实例 class 而不是 F1 - 这就是问题所在。

我的 B1 设置如下所示：

instance_class: B1
basic_scaling:
  max_instances: 4

现在，我创建了一个非常简单的演示来演示这个想法：

r.GET("foo", func(c *gin.Context) {
        _, err := tm.CreateTask(&tasks.TaskOptions{
            QueueID:  "bar",
            Method:   "method",
            PostBody: "foooo",
        })
        if err != nil {
            lg.LogErrorAndChill("failed, %v", err)
        }
    })

    r.POST("bar/method", func(c *gin.Context) {
        data, err := c.GetRawData()
        if err != nil {
            lg.LogErrorAndPanic("failed", err)
        }
        fmt.Printf("data is %v \n", string(data))
    })

解释其背后的逻辑：我向“foo”发送了一个请求，它创建了一个任务，该任务与一些正文一起添加到队列中。在任务中，根据 queueId 和 method 参数调用 post 方法，该方法接收一些文本，在这个简单的示例中只是将其注销。

现在，当我运行请求时，我收到 500 错误，如下所示：

[GIN] 2021/10/05 - 19:38:29 | 500 |     301.289µs |         0.1.0.3 | GET      "/_ah/start"

在日志中我可以看到：

Process terminated because it failed to respond to the start request with an HTTP status code of 200-299 or 404.

并在任务队列中（重试原因）：

INTERNAL(13): Instance Unavailable. HTTP status code 500

现在，我已经阅读了文档并了解以下内容：

Manual, basic, and automatically scaling instances startup differently. When you start a manual scaling instance, App Engine immediately sends a /_ah/start request to each instance. When you start an instance of a basic scaling service, App Engine allows it to accept traffic, but the /_ah/start request is not sent to an instance until it receives its first user request. Multiple basic scaling instances are only started as necessary, in order to handle increased traffic. Automatically scaling instances do not receive any /_ah/start request.

When an instance responds to the /_ah/start request with an HTTP status code of 200–299 or 404, it is considered to have successfully started and can handle additional requests. Otherwise, App Engine terminates the instance. Manual scaling instances are restarted immediately, while basic scaling instances are restarted only when needed for serving traffic

但这并不是很有帮助 - 我不明白为什么 /_ah/start 请求没有正确响应，我也不确定如何调试或修复它，尤其是 F1 实例工作正常。

Answer 1

对 url /_ah/start/ 的请求被路由到您的应用程序，而您的应用程序显然还没有准备好处理它，这导致了 500 响应。检查你的日志。

基本上，您的应用需要准备好使用 url /_ah/start/ 传入请求（类似于它准备好处理对 url /foo/ 的请求）。如果您在本地运行应用程序，请尝试打开这样的 url（通过 curl 等），看看会有什么反应。它需要响应200-299或404的响应代码（如您引用的文本中所述），否则将不会被视为成功启动的实例。

Google App Engine 在基本缩放时抛出错误

Google App Engine throws error on Basic Scaling

google-app-engine

autoscaling