如何更改现有 Service Fabric 群集上的 OS?

How to change the OS on an existing Service Fabric cluster?

我正在尝试更改我的 VMSS:

    "imageReference": {
      "publisher": "MicrosoftWindowsServer",
      "offer": "WindowsServer",
      "sku": "2016-Datacenter-with-Containers",
      "version": "latest"
    }

收件人:

    "imageReference": {
      "publisher": "MicrosoftWindowsServer",
      "offer": "WindowsServerSemiAnnual",
      "sku": "Datacenter-Core-1803-with-Containers-smalldisk",
      "version": "latest"
    }

我首先尝试的是:

Update-AzureRmVmss -ResourceGroupName "DevServiceFabric" -VMScaleSetName "HTTP" -ImageReferenceSku Datacenter-Core-1803-with-Containers-smalldisk -ImageReferenceOffer WindowsServerSemiAnnual

这给了我错误:

Update-AzureRmVmss : Changing property 'imageReference.offer' is not allowed. ErrorCode: PropertyChangeNotAllowed

这在文档中得到确认;您只能在创建规模集时设置报价。

接下来我尝试 Add-AzureRmServiceFabricNodeType 添加一个新的节点类型,我想我可以在之后删除旧的。但是,此命令似乎不允许您设置 OS 图像。您只能设置 VM SKU(换句话说,集群上的所有 VM 必须具有相同的 OS)。

有没有办法在不删除整个集群并从头开始的情况下改变这一点?

编辑 如果您可以留在当前发布商+报价内,只需更改 SKU 即可非常轻松地切换 OS。 .


如果您确实需要更改报价,您可以这样做:

Upgrade the size and operating system of the primary node type VMs.

请注意,您需要考虑很多因素,例如可用性水平。集群也将暂时无法从外部访问。

大幅缩短:

  • 将具有所需 OS 的第二个规模集添加到主节点类型
  • 禁用旧的规模集,然后将其删除
  • 切换负载均衡器
# Variables.
$groupname = "sfupgradetestgroup"
$clusterloc="southcentralus"  
$subscriptionID="<your subscription ID>"

# sign in to your Azure account and select your subscription
Login-AzAccount -SubscriptionId $subscriptionID 

# Create a new resource group for your deployment and give it a name and a location.
New-AzResourceGroup -Name $groupname -Location $clusterloc

# Deploy the two node type cluster.
New-AzResourceGroupDeployment -ResourceGroupName $groupname -TemplateParameterFile "C:\temp\cluster\Deploy-2NodeTypes-2ScaleSets.parameters.json" `
    -TemplateFile "C:\temp\cluster\Deploy-2NodeTypes-2ScaleSets.json" -Verbose

# Connect to the cluster and check the cluster health.
$ClusterName= "sfupgradetest.southcentralus.cloudapp.azure.com:19000"
$thumb="F361720F4BD5449F6F083DDE99DC51A86985B25B"

Connect-ServiceFabricCluster -ConnectionEndpoint $ClusterName -KeepAliveIntervalInSec 10 `
    -X509Credential `
    -ServerCertThumbprint $thumb  `
    -FindType FindByThumbprint `
    -FindValue $thumb `
    -StoreLocation CurrentUser `
    -StoreName My 

Get-ServiceFabricClusterHealth

# Deploy a new scale set into the primary node type.  Create a new load balancer and public IP address for the new scale set.
New-AzResourceGroupDeployment -ResourceGroupName $groupname -TemplateParameterFile "C:\temp\cluster\Deploy-2NodeTypes-3ScaleSets.parameters.json" `
    -TemplateFile "C:\temp\cluster\Deploy-2NodeTypes-3ScaleSets.json" -Verbose

# Check the cluster health again. All 15 nodes should be healthy.
Get-ServiceFabricClusterHealth

# Disable the nodes in the original scale set.
$nodeNames = @("_NTvm1_0","_NTvm1_1","_NTvm1_2","_NTvm1_3","_NTvm1_4")

Write-Host "Disabling nodes..."
foreach($name in $nodeNames){
    Disable-ServiceFabricNode -NodeName $name -Intent RemoveNode -Force
}

Write-Host "Checking node status..."
foreach($name in $nodeNames){

    $state = Get-ServiceFabricNode -NodeName $name 

    $loopTimeout = 50

    do{
        Start-Sleep 5
        $loopTimeout -= 1
        $state = Get-ServiceFabricNode -NodeName $name
        Write-Host "$name state: " $state.NodeDeactivationInfo.Status
    }

    while (($state.NodeDeactivationInfo.Status -ne "Completed") -and ($loopTimeout -ne 0))


    if ($state.NodeStatus -ne [System.Fabric.Query.NodeStatus]::Disabled)
    {
        Write-Error "$name node deactivation failed with state" $state.NodeStatus
        exit
    }
}

# Remove the scale set
$scaleSetName="NTvm1"
Remove-AzVmss -ResourceGroupName $groupname -VMScaleSetName $scaleSetName -Force
Write-Host "Removed scale set $scaleSetName"

$lbname="LB-sfupgradetest-NTvm1"
$oldPublicIpName="PublicIP-LB-FE-0"
$newPublicIpName="PublicIP-LB-FE-2"

# Store DNS settings of public IP address related to old Primary NodeType into variable 
$oldprimaryPublicIP = Get-AzPublicIpAddress -Name $oldPublicIpName  -ResourceGroupName $groupname

$primaryDNSName = $oldprimaryPublicIP.DnsSettings.DomainNameLabel

$primaryDNSFqdn = $oldprimaryPublicIP.DnsSettings.Fqdn

# Remove Load Balancer related to old Primary NodeType. This will cause a brief period of downtime for the cluster
Remove-AzLoadBalancer -Name $lbname -ResourceGroupName $groupname -Force

# Remove the old public IP
Remove-AzPublicIpAddress -Name $oldPublicIpName -ResourceGroupName $groupname -Force

# Replace DNS settings of Public IP address related to new Primary Node Type with DNS settings of Public IP address related to old Primary Node Type
$PublicIP = Get-AzPublicIpAddress -Name $newPublicIpName  -ResourceGroupName $groupname
$PublicIP.DnsSettings.DomainNameLabel = $primaryDNSName
$PublicIP.DnsSettings.Fqdn = $primaryDNSFqdn
Set-AzPublicIpAddress -PublicIpAddress $PublicIP

# Check the cluster health
Get-ServiceFabricClusterHealth

# Remove node state for the deleted nodes.
foreach($name in $nodeNames){
    # Remove the node from the cluster
    Remove-ServiceFabricNodeState -NodeName $name -TimeoutSec 300 -Force
    Write-Host "Removed node state for node $name"
}

对于那些想要切换到另一个 OS 但可以切换到同一 publisher/Offer 中的 OS 图像的人来说,这是另一个(更简单的)答案。您可以使用以下命令获取可用 OS SKU 的列表:

Get-AzureRmVMImageSku -Location 'westus2' -PublisherName MicrosoftWindowsServer -Offer WindowsServer

然后,您可以升级集群以使用该映像:

Update-AzureRmVmss -ResourceGroupName "DevServiceFabric" -VMScaleSetName "HTTP" -ImageReferenceSku 2019-Datacenter-Core-with-Containers-smalldisk

该命令将需要一个小时或更长时间才能到达 运行。

我还 运行 研究了一些出现 "Image Not Found" 错误的 SKU,即使它们出现在列表中。不知道这是什么原因。但是,在这种情况下,我发现它对我有用。