Azure SQL 故障转移组,宽限期是什么意思?

Azure SQL failover group, what does the grace period mean?

我目前正在阅读:https://docs.microsoft.com/en-us/azure/sql-database/sql-database-auto-failover-group,我很难理解自动故障转移策略:

By default, a failover group is configured with an automatic failover policy. The SQL Database service triggers failover after the failure is detected and the grace period has expired. The system must verify that the outage cannot be mitigated by the built-in high availability infrastructure of the SQL Database service due to the scale of the impact. If you want to control the failover workflow from the application, you can turn off automatic failover.

在 ARM 模板中定义故障转移组时:

{
  "condition": "[equals(parameters('redundancyId'), 'pri')]",
  "type": "Microsoft.Sql/servers",
  "kind": "v12.0",
  "name": "[variables('sqlServerPrimaryName')]",
  "apiVersion": "2014-04-01-preview",
  "location": "[parameters('location')]",
  "properties": {
    "administratorLogin": "[parameters('sqlServerPrimaryAdminUsername')]",
    "administratorLoginPassword": "[parameters('sqlServerPrimaryAdminPassword')]",
    "version": "12.0"
  },
  "resources": [
    {
      "condition": "[equals(parameters('redundancyId'), 'pri')]",
      "apiVersion": "2015-05-01-preview",
      "type": "failoverGroups",
      "name": "[variables('sqlFailoverGroupName')]",
      "properties": {
        "serverName": "[variables('sqlServerPrimaryName')]",
        "partnerServers": [
          {
            "id": "[resourceId('Microsoft.Sql/servers/', variables('sqlServerSecondaryName'))]"
          }
        ],
        "readWriteEndpoint": {
          "failoverPolicy": "Automatic",
          "failoverWithDataLossGracePeriodMinutes": 60
        },
        "readOnlyEndpoint": {
          "failoverPolicy": "Disabled"
        },
        "databases": [
          "[resourceId('Microsoft.Sql/servers/databases', variables('sqlServerPrimaryName'), variables('sqlDatabaseName'))]"
        ]
      },
      "dependsOn": [
        "[variables('sqlServerPrimaryName')]",
        "[resourceId('Microsoft.Sql/servers/databases', variables('sqlServerPrimaryName'), variables('sqlDatabaseName'))]",
        "[resourceId('Microsoft.Sql/servers', variables('sqlServerSecondaryName'))]"
      ]
    },
    {
      "condition": "[equals(parameters('redundancyId'), 'pri')]",
      "name": "[variables('sqlDatabaseName')]",
      "type": "databases",
      "apiVersion": "2014-04-01-preview",
      "location": "[parameters('location')]",
      "dependsOn": [
        "[variables('sqlServerPrimaryName')]"
      ],
      "properties": {
        "edition": "[variables('sqlDatabaseEdition')]",
        "requestedServiceObjectiveName": "[variables('sqlDatabaseServiceObjective')]"
      }
    }
  ]
},
{
  "condition": "[equals(parameters('redundancyId'), 'pri')]",
  "type": "Microsoft.Sql/servers",
  "kind": "v12.0",
  "name": "[variables('sqlServerSecondaryName')]",
  "apiVersion": "2014-04-01-preview",
  "location": "[variables('sqlServerSecondaryRegion')]",
  "properties": {
    "administratorLogin": "[parameters('sqlServerSecondaryAdminUsername')]",
    "administratorLoginPassword": "[parameters('sqlServerSecondaryAdminPassword')]",
    "version": "12.0"
  }
}

我这样指定 readWriteEndpoint:

    "readWriteEndpoint": {
      "failoverPolicy": "Automatic",
      "failoverWithDataLossGracePeriodMinutes": 60
    }

将 failoverWithDataLossGracePeriodMinutes 设置为 60 分钟。

这是什么意思?我无法在任何地方找到明确的答案。是否意味着:

  1. 当我的主数据库所在的主要区域发生中断时,read/write 端点指向主数据库,仅在 60 分钟后它才故障转移到我的辅助数据库,它成为新的主数据库。 60分钟内,读取我的数据的唯一方法是直接使用readOnlyEndpoint?或者
  2. 我的 read/write 端点会立即关闭,如果他们以某种方式检测到没有要同步的数据

我认为归结为:如果我检测到中断,如果我不关心数据丢失,但我希望能够写入我的数据库,我是否必须手动进行故障转移?

奖金问题:存在宽限期的原因是因为主服务器上可能存在未同步的数据,如果辅助服务器成为新的主服务器(如果我手动切换),这些数据将被覆盖或丢弃?

抱歉,我不能只问一个问题。我看了很多书,我真的需要知道这个。

What does this mean?

意思是:

"when a outage is happening in my primary region where my primary database resides, the read/write endpoint points to the primary and only after 60 minutes it fails over to my secondary, which becomes the new primary. "

即使数据已同步,它也无法自动进行故障转移,因为主要区域中的 high-availability 解决方案正在尝试做同样的事情,而且几乎所有时候您的主要数据库都会快速恢复在初级区。执行自动 cross-region fail-over 会干扰此。

"the reason why the grace period is present, is that because the there can be unsynced data on the primary, that will be overwritten, or tossed away, if the secondary becomes the new primary"

并留出时间让数据库在主要区域内进行故障转移。