滚动重启导致 App Engine 应用程序脱机。有没有办法更改配置以防止这种情况发生？

Question

大约每周一次，我们灵活的应用程序引擎节点应用程序会脱机，并且日志中会出现以下行：Restarting batch of VMs for version 20181008t134234 as part of rolling restart.我们将我们的应用程序设置为使用以下设置自动缩放：

runtime: nodejs
env: flex
beta_settings:
 cloud_sql_instances: tuzag-v2:us-east4:tuzag-db
automatic_scaling:
 min_num_instances: 1
 max_num_instances: 3
liveness_check:
 path: "/"
 check_interval_sec: 30
 timeout_sec: 4
 failure_threshold: 2
 success_threshold: 2
readiness_check:
 path: "/"
 check_interval_sec: 15
 timeout_sec: 4
 failure_threshold: 2
 success_threshold: 2
 app_start_timeout_sec: 300
resources:
 cpu: 1
 memory_gb: 1
 disk_size_gb: 10

我了解 GCP/GAE 的滚动重启，但对为什么 Google 在使我们的主要虚拟机脱机之前不启动另一台虚拟机感到困惑。我们是否必须运行使用最少 2 个实例来防止这种情况发生？有没有办法配置我的 app.yaml 以确保另一个实例在它重新启动唯一的运行ning 实例之前启动？重启完成后，一切恢复正常，但仍然有 10 分钟的停机时间，这是不可接受的，尤其是考虑到我们无法控制何时重启。

Answer 1

您是根据应用引擎仪表板中的 num instances 图猜测的吗？还是您的 App Engine 项目在那段时间实际上没有响应？

您可以使用 cron 每 5 分钟检查一次它是否有响应。

如果您将 cool_down_period_sec 和 target_utilization 改回默认值，此问题是否仍然存在？

如果你的服务在那段时间真的宕机了，也许你应该实施一个请求处理程序来检查活跃度： https://cloud.google.com/appengine/docs/flexible/python/reference/app-yaml#updated_health_checks

他们的默认轮询配置会告诉 GAE 在几分钟内启动

另一件值得仔细检查的事情是您的实例启动需要多长时间。

Answer 2

我们知道 expected behaviour that Flexible instances are restarted on a weekly basis. Provided that health checks 配置正确并且不是问题，建议确实是至少设置两个实例。

据我所知，App Engine Flex 中没有可引发新实例以避免因每周重启而导致停机的替代功能。您可以尝试运行直接在 Google Compute Engine 而不是 App Engine 上并自行管理更新和维护，也许这更适合您的目的。

滚动重启导致 App Engine 应用程序脱机。有没有办法更改配置以防止这种情况发生？

Rolling restarts are causing are app engine app to go offline. Is there a way to change the config to prevent that from happening?

google-app-engine

virtual-machine

google-cloud-platform

app-engine-flexible

google-appengine-node