在Azure VM上使用cloud-init挂载数据盘失败
Using cloud-init on an Azure VM to mount a data disk fails
这与之前的 SO 问题类似,我从中改编了我的代码
使用通过 Terraform 传递的云配置文件:
#cloud-config
disk_setup:
/dev/disk/azure/scsi1/lun0:
table_type: gpt
layout: true
overwrite: false
fs_setup:
- device: /dev/disk/azure/scsi1/lun0
partition: 1
filesystem: ext4
mounts:
- [
"/dev/disk/azure/scsi1/lun0-part1",
"/opt/data",
auto,
"defaults,noexec,nofail",
]
data "template_file" "cloudconfig" {
template = file("${path.module}/cloud-init.tpl")
}
data "template_cloudinit_config" "config" {
gzip = true
base64_encode = true
part {
content_type = "text/cloud-config"
content = "${data.template_file.cloudconfig.rendered}"
}
}
module "nexus_test_vm" {
#unnecessary details ommitted - 1 VM with 1 external disk, fixed lun of 0, ubuntu 18.04
vm_size = "Standard_B2S"
cloud_init_template = data.template_cloudinit_config.config.rendered
}
模块的相关位(VM创建)
resource "azurerm_virtual_machine" "generic-vm" {
count = var.number
name = "${local.my_name}-${count.index}-vm"
location = var.location
resource_group_name = var.resource_group_name
network_interface_ids = [azurerm_network_interface.generic-nic[count.index].id]
vm_size = var.vm_size
delete_os_disk_on_termination = true
storage_image_reference {
id = var.image_id
}
storage_os_disk {
name = "${local.my_name}-${count.index}-os"
caching = "ReadWrite"
create_option = "FromImage"
managed_disk_type = "Standard_LRS"
disk_size_gb = var.os_disk_size
}
os_profile {
computer_name = "${local.my_name}-${count.index}"
admin_username = local.my_admin_user_name
custom_data = var.cloud_init_template
}
os_profile_linux_config {
disable_password_authentication = true
ssh_keys {
path = "/home/${local.my_admin_user_name}/.ssh/authorized_keys"
//key_data = tls_private_key.vm_ssh_key.public_key_openssh
key_data = var.public_key_openssh
}
}
tags = {
Name = "${local.my_name}-${count.index}"
Deployment = local.my_deployment
Prefix = var.prefix
Environment = var.env
Location = var.location
Volatile = var.volatile
Terraform = "true"
}
}
resource "azurerm_managed_disk" "generic-disk" {
name = "${azurerm_virtual_machine.generic-vm.*.name[0]}-1-generic-disk"
location = var.rg_location
resource_group_name = var.rg_name
storage_account_type = "Standard_LRS"
create_option = "Empty"
disk_size_gb = var.external_disk_size
}
resource "azurerm_virtual_machine_data_disk_attachment" "generic-disk" {
managed_disk_id = azurerm_managed_disk.generic-disk.id
virtual_machine_id = azurerm_virtual_machine.generic-vm.*.id[0]
lun = 0
caching = "ReadWrite"
}
当 cloud-init 为 运行 时,我收到很多奇怪的错误提示磁盘不存在。但是,当我通过 ssh 进入虚拟机时,磁盘就在那里!这是竞争条件吗?我可以在 cloud-init 中配置等待时间吗?或者让我更好地了解可能发生的事情?
来自虚拟机的相关日志:
head -n 5000 /var/log/cloud-init.log | grep lun
2020-04-07 16:30:51,296 - cc_disk_setup.py[DEBUG]: Partitioning disks: {'/dev/disk/azure/scsi1/lun0': {'layout': True, 'overwrite': False, 'table_type': 'gpt'}, '/dev/disk/cloud/azure_resource': {'table_type': 'gpt', 'layout': [100], 'overwrite': True, '_origname': 'ephemeral0'}}
2020-04-07 16:30:51,318 - util.py[DEBUG]: Creating partition on /dev/disk/azure/scsi1/lun0 took 0.021 seconds
Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
RuntimeError: Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
2020-04-07 16:30:51,601 - cc_disk_setup.py[DEBUG]: setting up filesystems: [{'device': '/dev/disk/azure/scsi1/lun0', 'filesystem': 'ext4', 'partition': 1}]
2020-04-07 16:30:51,725 - util.py[DEBUG]: Creating fs for /dev/disk/azure/scsi1/lun0 took 0.124 seconds
Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
RuntimeError: Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
2020-04-07 16:30:51,733 - cc_mounts.py[DEBUG]: mounts configuration is [['/dev/disk/azure/scsi1/lun0-part1', '/opt/data', 'auto', 'defaults,noexec,nofail']]
2020-04-07 16:30:51,734 - cc_mounts.py[DEBUG]: Attempting to determine the real name of /dev/disk/azure/scsi1/lun0-part1
2020-04-07 16:30:51,734 - cc_mounts.py[DEBUG]: changed /dev/disk/azure/scsi1/lun0-part1 => None
2020-04-07 16:30:51,734 - cc_mounts.py[DEBUG]: Ignoring nonexistent named mount /dev/disk/azure/scsi1/lun0-part1
2020-04-07 16:30:51,736 - cc_mounts.py[DEBUG]: Changes to fstab: ['+ /dev/disk/azure/scsi1/lun0-part1 /opt/data auto defaults,noexec,nofail,comment=cloudconfig 0 2']
ls -l /dev/disk/azure/scsi1/lun0
lrwxrwxrwx 1 root root 12 Apr 7 16:32 /dev/disk/azure/scsi1/lun0 -> ../../../sdc
对于这个问题,我认为是数据盘和虚拟机以及cloud-init的先后顺序。据我所知,cloud-init 在 VM 首次启动时执行。而且你创建的Terraform文件好像数据盘可能创建的比虚拟机晚,所以也晚于cloud-init然后就报错了。
所以解决方法是在VM内部设置数据盘storage_data_disk
块,这样创建的VM会附加数据盘,然后执行cloud-init。
这与之前的 SO 问题类似,我从中改编了我的代码
使用通过 Terraform 传递的云配置文件:
#cloud-config
disk_setup:
/dev/disk/azure/scsi1/lun0:
table_type: gpt
layout: true
overwrite: false
fs_setup:
- device: /dev/disk/azure/scsi1/lun0
partition: 1
filesystem: ext4
mounts:
- [
"/dev/disk/azure/scsi1/lun0-part1",
"/opt/data",
auto,
"defaults,noexec,nofail",
]
data "template_file" "cloudconfig" {
template = file("${path.module}/cloud-init.tpl")
}
data "template_cloudinit_config" "config" {
gzip = true
base64_encode = true
part {
content_type = "text/cloud-config"
content = "${data.template_file.cloudconfig.rendered}"
}
}
module "nexus_test_vm" {
#unnecessary details ommitted - 1 VM with 1 external disk, fixed lun of 0, ubuntu 18.04
vm_size = "Standard_B2S"
cloud_init_template = data.template_cloudinit_config.config.rendered
}
模块的相关位(VM创建)
resource "azurerm_virtual_machine" "generic-vm" {
count = var.number
name = "${local.my_name}-${count.index}-vm"
location = var.location
resource_group_name = var.resource_group_name
network_interface_ids = [azurerm_network_interface.generic-nic[count.index].id]
vm_size = var.vm_size
delete_os_disk_on_termination = true
storage_image_reference {
id = var.image_id
}
storage_os_disk {
name = "${local.my_name}-${count.index}-os"
caching = "ReadWrite"
create_option = "FromImage"
managed_disk_type = "Standard_LRS"
disk_size_gb = var.os_disk_size
}
os_profile {
computer_name = "${local.my_name}-${count.index}"
admin_username = local.my_admin_user_name
custom_data = var.cloud_init_template
}
os_profile_linux_config {
disable_password_authentication = true
ssh_keys {
path = "/home/${local.my_admin_user_name}/.ssh/authorized_keys"
//key_data = tls_private_key.vm_ssh_key.public_key_openssh
key_data = var.public_key_openssh
}
}
tags = {
Name = "${local.my_name}-${count.index}"
Deployment = local.my_deployment
Prefix = var.prefix
Environment = var.env
Location = var.location
Volatile = var.volatile
Terraform = "true"
}
}
resource "azurerm_managed_disk" "generic-disk" {
name = "${azurerm_virtual_machine.generic-vm.*.name[0]}-1-generic-disk"
location = var.rg_location
resource_group_name = var.rg_name
storage_account_type = "Standard_LRS"
create_option = "Empty"
disk_size_gb = var.external_disk_size
}
resource "azurerm_virtual_machine_data_disk_attachment" "generic-disk" {
managed_disk_id = azurerm_managed_disk.generic-disk.id
virtual_machine_id = azurerm_virtual_machine.generic-vm.*.id[0]
lun = 0
caching = "ReadWrite"
}
当 cloud-init 为 运行 时,我收到很多奇怪的错误提示磁盘不存在。但是,当我通过 ssh 进入虚拟机时,磁盘就在那里!这是竞争条件吗?我可以在 cloud-init 中配置等待时间吗?或者让我更好地了解可能发生的事情?
来自虚拟机的相关日志:
head -n 5000 /var/log/cloud-init.log | grep lun
2020-04-07 16:30:51,296 - cc_disk_setup.py[DEBUG]: Partitioning disks: {'/dev/disk/azure/scsi1/lun0': {'layout': True, 'overwrite': False, 'table_type': 'gpt'}, '/dev/disk/cloud/azure_resource': {'table_type': 'gpt', 'layout': [100], 'overwrite': True, '_origname': 'ephemeral0'}}
2020-04-07 16:30:51,318 - util.py[DEBUG]: Creating partition on /dev/disk/azure/scsi1/lun0 took 0.021 seconds
Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
RuntimeError: Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
2020-04-07 16:30:51,601 - cc_disk_setup.py[DEBUG]: setting up filesystems: [{'device': '/dev/disk/azure/scsi1/lun0', 'filesystem': 'ext4', 'partition': 1}]
2020-04-07 16:30:51,725 - util.py[DEBUG]: Creating fs for /dev/disk/azure/scsi1/lun0 took 0.124 seconds
Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
RuntimeError: Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
2020-04-07 16:30:51,733 - cc_mounts.py[DEBUG]: mounts configuration is [['/dev/disk/azure/scsi1/lun0-part1', '/opt/data', 'auto', 'defaults,noexec,nofail']]
2020-04-07 16:30:51,734 - cc_mounts.py[DEBUG]: Attempting to determine the real name of /dev/disk/azure/scsi1/lun0-part1
2020-04-07 16:30:51,734 - cc_mounts.py[DEBUG]: changed /dev/disk/azure/scsi1/lun0-part1 => None
2020-04-07 16:30:51,734 - cc_mounts.py[DEBUG]: Ignoring nonexistent named mount /dev/disk/azure/scsi1/lun0-part1
2020-04-07 16:30:51,736 - cc_mounts.py[DEBUG]: Changes to fstab: ['+ /dev/disk/azure/scsi1/lun0-part1 /opt/data auto defaults,noexec,nofail,comment=cloudconfig 0 2']
ls -l /dev/disk/azure/scsi1/lun0
lrwxrwxrwx 1 root root 12 Apr 7 16:32 /dev/disk/azure/scsi1/lun0 -> ../../../sdc
对于这个问题,我认为是数据盘和虚拟机以及cloud-init的先后顺序。据我所知,cloud-init 在 VM 首次启动时执行。而且你创建的Terraform文件好像数据盘可能创建的比虚拟机晚,所以也晚于cloud-init然后就报错了。
所以解决方法是在VM内部设置数据盘storage_data_disk
块,这样创建的VM会附加数据盘,然后执行cloud-init。