需要帮助使用虚拟机配置重新映像 Azure 批处理池中的现有 Azure 节点
Need help reimaging an existing Azure Node in a Azure Batch Pool with Virtual Machine Configuration
有没有人对如何在我的 Azure Batch Pool 帐户中重新映像 Linux 个节点有建议,而无需将大小调整为 0 然后返回 N,或者删除池并重新创建它?
或者是推荐的最佳实践
更多详情:
我在重新映像 azure 节点时遇到问题。当我更新 docker 图像并使用 ARM 模板重新部署时,节点未拉取最新的 docker 图像。我想这可能是因为图片名称相同(我一直想要最新的图片)
我试过使用:
Reset-AzureBatchComputeNode 但这在 fiddler "Operation reimage can be invoked only on pools created with cloudServiceConfiguration" 上给了我以下错误。我不能使用云服务配置,因为机器需要是 Linux 机器。
Restart-AzureBatchComputeNode,但这只会重启节点而不是重新映像它
我可能只需要核对节点(将大小调整为 0,然后根据需要再次旋转),或者简单地删除池然后重新设置。但是这些看起来像 "nuclear" 选项,并且批处理服务将关闭,直到节点再次启动。
我用于 deploy/update 批处理池
的 arm 模板
{
"name": "[concat(parameters('batchAccountName'), '/<pool-name>')]",
"type": "Microsoft.Batch/batchAccounts/pools",
"apiVersion": "2018-12-01",
"properties": {
"vmSize": "[parameters('vmSize')]",
"deploymentConfiguration": {
"virtualMachineConfiguration": {
"nodeAgentSkuId": "batch.node.ubuntu 16.04",
"imageReference": {
"publisher": "microsoft-azure-batch",
"offer": "ubuntu-server-container",
"sku": "16-04-lts",
"version": "latest"
},
"containerConfiguration": {
"type": "DockerCompatible",
"containerImageNames": [
"[concat(parameters('containerRegistryServer'), '/<container-name>')]"
],
"containerRegistries": [
<credentials>
]
}
}
},
"scaleSettings": {
"fixedScale": {
"targetDedicatedNodes": "[parameters('targetDedicatedNodes')]"
}
}
},
"dependsOn": [
"[resourceId('Microsoft.Batch/batchAccounts', parameters('batchAccountName'))]"
]
},
--
更新:
感谢@fpark,根据你的建议,我想出了以下 powershell 脚本以防其他人
Write-Output "Building docker image"
$imageHashBeforeBuild = docker images $DockerImageName --format "{{.ID}}" --no-trunc
docker build -t $DockerImageName $pathToEnergyModel
if (!$?) {
throw "Docker image $DockerImageName failed to build"
}
$imageHashAfterBuild = docker images $DockerImageName --format "{{.ID}}" --no-trunc
...
$batchContext = Get-AzureRmBatchAccount -Name $batchAccountName
...
# The nodes should only be reimaged if the model has an update and this is NOT a new deployment
$ShouldReimageNodes = $IsUpdate -and $imageHashBeforeBuild -and ($imageHashBeforeBuild -ne $imageHashAfterBuild)
# The batchAccountDeployment step will create/update batch accounts/pools,
# However, the deployment does not update the VM image to the latest present in the docker container registry
# This is likely due to the ARM template having the same settings, so it doesn't know to try pull the image down again
# As a work around:
# 1) Grab all current nodes
# 2) For each node:
# a) Bring it down (this has a side effect of reducing TargetDedicatedComputeNodes by 1)
# b) Resize the number of TargetDedicatedComputeNodes to correct value (i.e. spin up a node to replace the one downed in 2a)
# When the VM's come back up, they indeed pull the latest docker image
if ($ShouldReimageNodes) {
# Wait for nodes to stabilize
Write-Host "Difference in docker images detected. Restarting each node one at a time to ensure latest docker image is being used."
while ((Get-AzureBatchPool -BatchContext $batchContext -Id $PoolName).AllocationState -ne "Steady") {
Write-Host "Waiting for nodes in $PoolName to stabilize. Checking status again in $SleepTime seconds."
Start-Sleep -Seconds $SleepTime
}
$nodes = Get-AzureBatchComputeNode -PoolId $PoolName -BatchContext $batchContext
$currentNodeCount = $nodes.Length
foreach ($node in $nodes) {
$nodeId = $node.Id
Write-Host "Removing node $nodeId"
Remove-AzureBatchComputeNode -ComputeNode $node -BatchContext $batchContext -Force
while ((Get-AzureBatchPool -BatchContext $batchContext -Id $PoolName).AllocationState -ne "Steady") {
Write-Host "Waiting for nodes in $PoolName to stabilize. Checking status again in $SleepTime seconds."
Start-Sleep -Seconds $SleepTime
}
Write-Host "Resizing back to $currentNodeCount"
Start-AzureBatchPoolResize -Id $PoolName -BatchContext $batchContext -TargetDedicatedComputeNodes $currentNodeCount
while ((Get-AzureBatchPool -BatchContext $batchContext -Id $PoolName).AllocationState -ne "Steady") {
Write-Host "Waiting for nodes in $PoolName to stabilize. Checking status again in $SleepTime seconds."
Start-Sleep -Seconds $SleepTime
}
}
}
目前,不支持对基于虚拟机配置的池进行重新映像操作。请参阅 this uservoice idea。
您可以通过调用 Remove-AzureBatchComputeNode cmdlet 模拟重新映像一组节点,然后将大小调整回您想要的大小。
有没有人对如何在我的 Azure Batch Pool 帐户中重新映像 Linux 个节点有建议,而无需将大小调整为 0 然后返回 N,或者删除池并重新创建它?
或者是推荐的最佳实践
更多详情:
我在重新映像 azure 节点时遇到问题。当我更新 docker 图像并使用 ARM 模板重新部署时,节点未拉取最新的 docker 图像。我想这可能是因为图片名称相同(我一直想要最新的图片)
我试过使用:
Reset-AzureBatchComputeNode 但这在 fiddler "Operation reimage can be invoked only on pools created with cloudServiceConfiguration" 上给了我以下错误。我不能使用云服务配置,因为机器需要是 Linux 机器。
Restart-AzureBatchComputeNode,但这只会重启节点而不是重新映像它
我可能只需要核对节点(将大小调整为 0,然后根据需要再次旋转),或者简单地删除池然后重新设置。但是这些看起来像 "nuclear" 选项,并且批处理服务将关闭,直到节点再次启动。
我用于 deploy/update 批处理池
的 arm 模板{
"name": "[concat(parameters('batchAccountName'), '/<pool-name>')]",
"type": "Microsoft.Batch/batchAccounts/pools",
"apiVersion": "2018-12-01",
"properties": {
"vmSize": "[parameters('vmSize')]",
"deploymentConfiguration": {
"virtualMachineConfiguration": {
"nodeAgentSkuId": "batch.node.ubuntu 16.04",
"imageReference": {
"publisher": "microsoft-azure-batch",
"offer": "ubuntu-server-container",
"sku": "16-04-lts",
"version": "latest"
},
"containerConfiguration": {
"type": "DockerCompatible",
"containerImageNames": [
"[concat(parameters('containerRegistryServer'), '/<container-name>')]"
],
"containerRegistries": [
<credentials>
]
}
}
},
"scaleSettings": {
"fixedScale": {
"targetDedicatedNodes": "[parameters('targetDedicatedNodes')]"
}
}
},
"dependsOn": [
"[resourceId('Microsoft.Batch/batchAccounts', parameters('batchAccountName'))]"
]
},
--
更新:
感谢@fpark,根据你的建议,我想出了以下 powershell 脚本以防其他人
Write-Output "Building docker image"
$imageHashBeforeBuild = docker images $DockerImageName --format "{{.ID}}" --no-trunc
docker build -t $DockerImageName $pathToEnergyModel
if (!$?) {
throw "Docker image $DockerImageName failed to build"
}
$imageHashAfterBuild = docker images $DockerImageName --format "{{.ID}}" --no-trunc
...
$batchContext = Get-AzureRmBatchAccount -Name $batchAccountName
...
# The nodes should only be reimaged if the model has an update and this is NOT a new deployment
$ShouldReimageNodes = $IsUpdate -and $imageHashBeforeBuild -and ($imageHashBeforeBuild -ne $imageHashAfterBuild)
# The batchAccountDeployment step will create/update batch accounts/pools,
# However, the deployment does not update the VM image to the latest present in the docker container registry
# This is likely due to the ARM template having the same settings, so it doesn't know to try pull the image down again
# As a work around:
# 1) Grab all current nodes
# 2) For each node:
# a) Bring it down (this has a side effect of reducing TargetDedicatedComputeNodes by 1)
# b) Resize the number of TargetDedicatedComputeNodes to correct value (i.e. spin up a node to replace the one downed in 2a)
# When the VM's come back up, they indeed pull the latest docker image
if ($ShouldReimageNodes) {
# Wait for nodes to stabilize
Write-Host "Difference in docker images detected. Restarting each node one at a time to ensure latest docker image is being used."
while ((Get-AzureBatchPool -BatchContext $batchContext -Id $PoolName).AllocationState -ne "Steady") {
Write-Host "Waiting for nodes in $PoolName to stabilize. Checking status again in $SleepTime seconds."
Start-Sleep -Seconds $SleepTime
}
$nodes = Get-AzureBatchComputeNode -PoolId $PoolName -BatchContext $batchContext
$currentNodeCount = $nodes.Length
foreach ($node in $nodes) {
$nodeId = $node.Id
Write-Host "Removing node $nodeId"
Remove-AzureBatchComputeNode -ComputeNode $node -BatchContext $batchContext -Force
while ((Get-AzureBatchPool -BatchContext $batchContext -Id $PoolName).AllocationState -ne "Steady") {
Write-Host "Waiting for nodes in $PoolName to stabilize. Checking status again in $SleepTime seconds."
Start-Sleep -Seconds $SleepTime
}
Write-Host "Resizing back to $currentNodeCount"
Start-AzureBatchPoolResize -Id $PoolName -BatchContext $batchContext -TargetDedicatedComputeNodes $currentNodeCount
while ((Get-AzureBatchPool -BatchContext $batchContext -Id $PoolName).AllocationState -ne "Steady") {
Write-Host "Waiting for nodes in $PoolName to stabilize. Checking status again in $SleepTime seconds."
Start-Sleep -Seconds $SleepTime
}
}
}
目前,不支持对基于虚拟机配置的池进行重新映像操作。请参阅 this uservoice idea。
您可以通过调用 Remove-AzureBatchComputeNode cmdlet 模拟重新映像一组节点,然后将大小调整回您想要的大小。