Azure 应用服务自动缩放无法缩减

Question

我的应用服务缩容后缩容失败。这似乎是我几个月来一直试图解决的问题。

我尝试了以下但 none 有效：

我的比例条件是基于 CPU 和记忆。但是，我从未见过 CPU 超过 12%，所以我假设它实际上是根据内存进行缩放的。

将扩展条件设置为内存超过 90% 超过 5 分钟平均 10 分钟。平均 5 分钟内内存的冷却时间和规模低于 70%。这似乎没有意义，因为如果我的内存利用率已经达到 90%，那么我确实存在底层内存泄漏并且应该已经横向扩展了。
将扩展条件设置为内存超过 80% 超过 60 分钟平均 10 分钟。平均 5 分钟内内存的冷却时间和规模低于 60%。这更有意义，因为我已经看到内存使用量在几个小时内激增然后下降。

预期行为：应用服务自动缩放将在 5 分钟后内存使用率降至 60% 以下时减少实例计数。

问题：

如果我的基线 CPU 大致保持在平均 6% 且内存保持在 53%，那么理想的指标阈值是多少？意思是，在不担心诸如抖动之类的反模式的情况下，缩小的最佳最小值和横向扩展的最佳最大值是多少？ 20% 差异的较大阈值对我来说更有意义。

备选方案：

鉴于像 "push button scaling" 这样简单的市场营销所涉及的故障排除量，几乎不值得为配置模糊而头疼（如果没有自定义 powershell 脚本！）。我正在考虑禁用自动缩放，因为它的不可预测性，只保留 2 个实例运行用于自动负载平衡和手动缩放。

自动缩放配置：

{
    "location": "East US 2",
    "tags": {
        "$type": "Microsoft.WindowsAzure.Management.Common.Storage.CasePreservedDictionary, Microsoft.WindowsAzure.Management.Common.Storage"
    },
    "properties": {
        "name": "CPU and Memory Autoscale",
        "enabled": true,
        "targetResourceUri": "/redacted",
        "profiles": [
            {
                "name": "Auto created scale condition",
                "capacity": {
                    "minimum": "1",
                    "maximum": "10",
                    "default": "1"
                },
                "rules": [
                    {
                        "scaleAction": {
                            "direction": "Increase",
                            "type": "ChangeCount",
                            "value": "1",
                            "cooldown": "PT10M"
                        },
                        "metricTrigger": {
                            "metricName": "MemoryPercentage",
                            "metricNamespace": "",
                            "metricResourceUri": "/redacted",
                            "operator": "GreaterThanOrEqual",
                            "statistic": "Average",
                            "threshold": 80,
                            "timeAggregation": "Average",
                            "timeGrain": "PT1M",
                            "timeWindow": "PT1H"
                        }
                    },
                    {
                        "scaleAction": {
                            "direction": "Decrease",
                            "type": "ChangeCount",
                            "value": "1",
                            "cooldown": "PT5M"
                        },
                        "metricTrigger": {
                            "metricName": "MemoryPercentage",
                            "metricNamespace": "",
                            "metricResourceUri": "/redacted",
                            "operator": "LessThanOrEqual",
                            "statistic": "Average",
                            "threshold": 60,
                            "timeAggregation": "Average",
                            "timeGrain": "PT1M",
                            "timeWindow": "PT10M"
                        }
                    },
                    {
                        "scaleAction": {
                            "direction": "Increase",
                            "type": "ChangeCount",
                            "value": "1",
                            "cooldown": "PT5M"
                        },
                        "metricTrigger": {
                            "metricName": "CpuPercentage",
                            "metricNamespace": "",
                            "metricResourceUri": "/redacted",
                            "operator": "GreaterThanOrEqual",
                            "statistic": "Average",
                            "threshold": 60,
                            "timeAggregation": "Average",
                            "timeGrain": "PT1M",
                            "timeWindow": "PT1H"
                        }
                    },
                    {
                        "scaleAction": {
                            "direction": "Decrease",
                            "type": "ChangeCount",
                            "value": "1",
                            "cooldown": "PT5M"
                        },
                        "metricTrigger": {
                            "metricName": "CpuPercentage",
                            "metricNamespace": "",
                            "metricResourceUri": "/redacted",
                            "operator": "LessThanOrEqual",
                            "statistic": "Average",
                            "threshold": 40,
                            "timeAggregation": "Average",
                            "timeGrain": "PT1M",
                            "timeWindow": "PT10M"
                        }
                    }
                ]
            }
        ],
        "notifications": [
            {
                "operation": "Scale",
                "email": {
                    "sendToSubscriptionAdministrator": false,
                    "sendToSubscriptionCoAdministrators": false,
                    "customEmails": [
                        "redacted"
                    ]
                },
                "webhooks": []
            }
        ],
        "targetResourceLocation": "East US 2"
    },
    "id": "/redacted",
    "name": "CPU and Memory Autoscale",
    "type": "Microsoft.Insights/autoscaleSettings"
}

Answer 1

我遇到了完全相同的问题，我开始相信目前不可能像我们想要的那样自动缩放回一个实例。

我目前的解决方法是使用第二个配置文件缩小到 1 个实例，该配置文件每天在 23:55 和 00:00 之间重复。

只是重申一下这个问题。我有以下情况。和你的基本一样。

应用服务的内存基线是 50%
当平均（内存）> 80% 时横向扩展 1 个实例
当平均（内存）< 60% 时缩减 1 个实例

当平均内存百分比超过 80% 时，从 1 个实例横向扩展到 2 个实例将正常工作。但是缩减到 1 个实例将永远无法工作，因为内存基线太高了。

看了Best Practices，我的理解是在缩容的时候，会预估得到的内存百分比，检查是否没有触发scale out规则。

因此，如果两个实例的平均内存百分比下降到 50%，则会触发缩减规则，并且它将估计生成的内存使用量为 2 * 50% / 1 = 100%，这当然会触发缩减规则，因此它不会缩小。

然而，当从 3 个实例扩展到 2 个实例时，它应该可以工作：3 * 50% / 2 = 75% 小于扩展规则的 80%。

Answer 2

对于 CpuPercentage 指标，当它超过 60 时，您有一个 SCALE UP 操作，当它低于 40 时，您有一个缩小操作，两者之间的差异非常小。这可能会导致描述为 Flapping 的行为，这将导致 AutoScale 的缩放操作无法启动。类似的问题是您配置的 MemoryPercent 规则。

您的放大阈值和缩放阈值之间至少应有 40 的差异，以避免抖动。有关 Flapping 的更多详细信息，请参见 https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/insights-autoscale-best-practices#choose-the-thresholds-carefully-for-all-metric-types（搜索 Flapping 一词）

Answer 3

我这里也有同样的问题。我的应用程序只需要一个实例，并且我有一个自动缩放配置，例如：

Scale out
When br-empresa (Average) CpuPercentage > 85 Increase instance count by 1
Or Br-Empresa (Average) MemoryPercentage > 85 Increase instance count by 1

Scale in
When br-empresa (Average) CpuPercentage <= 75 Decrease instance count by 1
And Br-Empresa (Average) MemoryPercentage <= 75 Decrease instance count by 1

内存的基准是 60%。

Scale Out 逻辑运行良好。但是即使内存下降到 60%，应用程序也不会缩减。 (60% * 2) / 1 = 120%

对于内存或 cpu 指标，实际的抖动估计没有意义。

Azure 应用服务自动缩放无法缩减

Azure App Service Autoscale Fails to Scale In

azure

autoscaling

azure-app-service-plans

azure-web-app-service