如何授予 AKS 通过 Terraform 访问 ACR 的权限?
How to give permissions to AKS to access ACR via terraform?
问题和详情
如何允许 Azure 中的 Kubernetes 集群通过 Terraform 与 Azure 容器注册表通信?
我想从我的 Azure 容器注册表加载自定义图像。不幸的是,我在 Kubernetes 应该从 ACR 下载图像时遇到权限错误。
到目前为止我尝试了什么
我在没有 terraform (az cli) 的情况下进行的实验
在我通过 az cli 将 acr 附加到 aks 后一切正常:
az aks update -n myAKSCluster -g myResourceGroup --attach-acr <acrName>
我对 Terraform 的实验
这是我的地形配置;我已经剥离了一些其他的东西。它本身有效。
terraform {
backend "azurerm" {
resource_group_name = "tf-state"
storage_account_name = "devopstfstate"
container_name = "tfstatetest"
key = "prod.terraform.tfstatetest"
}
}
provider "azurerm" {
}
provider "azuread" {
}
provider "random" {
}
# define the password
resource "random_string" "password" {
length = 32
special = true
}
# define the resource group
resource "azurerm_resource_group" "rg" {
name = "myrg"
location = "eastus2"
}
# define the app
resource "azuread_application" "tfapp" {
name = "mytfapp"
}
# define the service principal
resource "azuread_service_principal" "tfapp" {
application_id = azuread_application.tfapp.application_id
}
# define the service principal password
resource "azuread_service_principal_password" "tfapp" {
service_principal_id = azuread_service_principal.tfapp.id
end_date = "2020-12-31T09:00:00Z"
value = random_string.password.result
}
# define the container registry
resource "azurerm_container_registry" "acr" {
name = "mycontainerregistry2387987222"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku = "Basic"
admin_enabled = false
}
# define the kubernetes cluster
resource "azurerm_kubernetes_cluster" "mycluster" {
name = "myaks"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = "mycluster"
network_profile {
network_plugin = "azure"
}
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_B2s"
}
# Use the service principal created above
service_principal {
client_id = azuread_service_principal.tfapp.application_id
client_secret = azuread_service_principal_password.tfapp.value
}
tags = {
Environment = "demo"
}
windows_profile {
admin_username = "dingding"
admin_password = random_string.password.result
}
}
# define the windows node pool for kubernetes
resource "azurerm_kubernetes_cluster_node_pool" "winpool" {
name = "winp"
kubernetes_cluster_id = azurerm_kubernetes_cluster.mycluster.id
vm_size = "Standard_B2s"
node_count = 1
os_type = "Windows"
}
# define the kubernetes name space
resource "kubernetes_namespace" "namesp" {
metadata {
name = "namesp"
}
}
# Try to give permissions, to let the AKR access the ACR
resource "azurerm_role_assignment" "acrpull_role" {
scope = azurerm_container_registry.acr.id
role_definition_name = "AcrPull"
principal_id = azuread_service_principal.tfapp.object_id
skip_service_principal_aad_check = true
}
此代码改编自https://github.com/terraform-providers/terraform-provider-azuread/issues/104。
不幸的是,当我在 kubernetes 集群中启动容器时,我收到一条错误消息:
Failed to pull image "mycontainerregistry.azurecr.io/myunittests": [rpc error: code = Unknown desc = Error response from daemon: manifest for mycontainerregistry.azurecr.io/myunittests:latest not found: manifest unknown: manifest unknown, rpc error: code = Unknown desc = Error response from daemon: Get https://mycontainerregistry.azurecr.io/v2/myunittests/manifests/latest: unauthorized: authentication required]
更新/备注:
当我运行terraform apply
用上面的代码时,创建资源被中断:
azurerm_container_registry.acr: Creation complete after 18s [id=/subscriptions/000/resourceGroups/myrg/providers/Microsoft.ContainerRegistry/registries/mycontainerregistry2387987222]
azurerm_role_assignment.acrpull_role: Creating...
azuread_service_principal_password.tfapp: Still creating... [10s elapsed]
azuread_service_principal_password.tfapp: Creation complete after 12s [id=000/000]
azurerm_kubernetes_cluster.mycluster: Creating...
azurerm_role_assignment.acrpull_role: Creation complete after 8s [id=/subscriptions/000/resourceGroups/myrg/providers/Microsoft.ContainerRegistry/registries/mycontainerregistry2387987222/providers/Microsoft.Authorization/roleAssignments/000]
azurerm_kubernetes_cluster.mycluster: Still creating... [10s elapsed]
Error: Error creating Managed Kubernetes Cluster "myaks" (Resource Group "myrg"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="ServicePrincipalNotFound" Message="Service principal clientID: 000 not found in Active Directory tenant 000, Please see https://aka.ms/aks-sp-help for more details."
on test.tf line 56, in resource "azurerm_kubernetes_cluster" "mycluster":
56: resource "azurerm_kubernetes_cluster" "mycluster" {
不过,我认为这只是因为创建服务主体需要几分钟时间。几分钟后,当我再次 运行 terraform apply
时,它毫无问题地超越了那个点。
这段代码对我有用。
resource "azuread_application" "aks_sp" {
name = "sp-aks-${local.cluster_name}"
}
resource "azuread_service_principal" "aks_sp" {
application_id = azuread_application.aks_sp.application_id
app_role_assignment_required = false
}
resource "azuread_service_principal_password" "aks_sp" {
service_principal_id = azuread_service_principal.aks_sp.id
value = random_string.aks_sp_password.result
end_date_relative = "8760h" # 1 year
lifecycle {
ignore_changes = [
value,
end_date_relative
]
}
}
resource "azuread_application_password" "aks_sp" {
application_object_id = azuread_application.aks_sp.id
value = random_string.aks_sp_secret.result
end_date_relative = "8760h" # 1 year
lifecycle {
ignore_changes = [
value,
end_date_relative
]
}
}
data "azurerm_container_registry" "pyp" {
name = var.container_registry_name
resource_group_name = var.container_registry_resource_group_name
}
resource "azurerm_role_assignment" "aks_sp_container_registry" {
scope = data.azurerm_container_registry.pyp.id
role_definition_name = "AcrPull"
principal_id = azuread_service_principal.aks_sp.object_id
}
# requires Azure Provider 1.37+
resource "azurerm_kubernetes_cluster" "pyp" {
name = local.cluster_name
location = azurerm_resource_group.pyp.location
resource_group_name = azurerm_resource_group.pyp.name
dns_prefix = local.env_name_nosymbols
kubernetes_version = local.kubernetes_version
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_D2s_v3"
os_disk_size_gb = 80
}
windows_profile {
admin_username = "winadm"
admin_password = random_string.windows_profile_password.result
}
network_profile {
network_plugin = "azure"
dns_service_ip = cidrhost(local.service_cidr, 10)
docker_bridge_cidr = "172.17.0.1/16"
service_cidr = local.service_cidr
load_balancer_sku = "standard"
}
service_principal {
client_id = azuread_service_principal.aks_sp.application_id
client_secret = random_string.aks_sp_password.result
}
addon_profile {
oms_agent {
enabled = true
log_analytics_workspace_id = azurerm_log_analytics_workspace.pyp.id
}
}
tags = local.tags
}
来源https://github.com/giuliov/pipeline-your-pipelines/tree/master/src/kubernetes/terraform
(上面的答案是我做的)
只需添加一种更简单的方法,您无需为可能需要它的任何其他人创建服务主体。
resource "azurerm_kubernetes_cluster" "kubweb" {
name = local.cluster_web
location = local.rgloc
resource_group_name = local.rgname
dns_prefix = "${local.cluster_web}-dns"
kubernetes_version = local.kubversion
# used to group all the internal objects of this cluster
node_resource_group = "${local.cluster_web}-rg-node"
# azure will assign the id automatically
identity {
type = "SystemAssigned"
}
default_node_pool {
name = "nodepool1"
node_count = 4
vm_size = local.vm_size
orchestrator_version = local.kubversion
}
role_based_access_control {
enabled = true
}
addon_profile {
kube_dashboard {
enabled = true
}
}
tags = {
environment = local.env
}
}
resource "azurerm_container_registry" "acr" {
name = "acr1"
resource_group_name = local.rgname
location = local.rgloc
sku = "Standard"
admin_enabled = true
tags = {
environment = local.env
}
}
# add the role to the identity the kubernetes cluster was assigned
resource "azurerm_role_assignment" "kubweb_to_acr" {
scope = azurerm_container_registry.acr.id
role_definition_name = "AcrPull"
principal_id = azurerm_kubernetes_cluster.kubweb.kubelet_identity[0].object_id
}
只是想更深入地了解一下,因为这是我一直在努力解决的问题 as-well。
推荐的方法是使用托管标识而不是服务主体,因为开销较小。
创建容器注册表:
resource "azurerm_container_registry" "acr" {
name = "acr"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku = "Standard"
admin_enabled = false
}
创建一个 AKS 集群,下面的代码创建具有 2 个身份的 AKS 集群:
- A System Assigned Identity 分配给 Control Plane。
- A User Assigned Managed Identity 也会自动创建并分配给 Kubelet,注意我没有具体的代码,因为它是自动发生的。
Kubelet 是去 Container Registry 拉取镜像的进程,因此我们需要确保这个 User Assigned Managed Identity 在 Container Registry 上有 AcrPull Role。
resource "azurerm_kubernetes_cluster" "aks" {
name = "aks"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
dns_prefix = "aks"
node_resource_group = "aks-node"
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_Ds2_v2"
enable_auto_scaling = false
type = "VirtualMachineScaleSets"
vnet_subnet_id = azurerm_subnet.aks_subnet.id
max_pods = 50
}
network_profile {
network_plugin = "azure"
load_balancer_sku = "Standard"
}
identity {
type = "SystemAssigned"
}
}
创建上述角色分配以允许用户分配的托管标识从 Container Registry 中提取。
resource "azurerm_role_assignment" "ra" {
principal_id = azurerm_kubernetes_cluster.aks.kubelet_identity[0].object_id
role_definition_name = "AcrPull"
scope = azurerm_container_registry.acr.id
skip_service_principal_aad_check = true
}
希望这能为您解决问题,因为我在 Internet 上看到一些关于创建的两个身份的混淆。
Azure 容器注册表资源的 Terraform 文档现在对此进行跟踪,应该始终保持最新。
resource "azurerm_resource_group" "example" {
name = "example-resources"
location = "West Europe"
}
resource "azurerm_container_registry" "example" {
name = "containerRegistry1"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
}
resource "azurerm_kubernetes_cluster" "example" {
name = "example-aks1"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
dns_prefix = "exampleaks1"
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_D2_v2"
}
identity {
type = "SystemAssigned"
}
tags = {
Environment = "Production"
}
}
resource "azurerm_role_assignment" "example" {
principal_id = azurerm_kubernetes_cluster.example.kubelet_identity[0].object_id
role_definition_name = "AcrPull"
scope = azurerm_container_registry.example.id
skip_service_principal_aad_check = true
}
问题和详情
如何允许 Azure 中的 Kubernetes 集群通过 Terraform 与 Azure 容器注册表通信?
我想从我的 Azure 容器注册表加载自定义图像。不幸的是,我在 Kubernetes 应该从 ACR 下载图像时遇到权限错误。
到目前为止我尝试了什么
我在没有 terraform (az cli) 的情况下进行的实验
在我通过 az cli 将 acr 附加到 aks 后一切正常:
az aks update -n myAKSCluster -g myResourceGroup --attach-acr <acrName>
我对 Terraform 的实验
这是我的地形配置;我已经剥离了一些其他的东西。它本身有效。
terraform {
backend "azurerm" {
resource_group_name = "tf-state"
storage_account_name = "devopstfstate"
container_name = "tfstatetest"
key = "prod.terraform.tfstatetest"
}
}
provider "azurerm" {
}
provider "azuread" {
}
provider "random" {
}
# define the password
resource "random_string" "password" {
length = 32
special = true
}
# define the resource group
resource "azurerm_resource_group" "rg" {
name = "myrg"
location = "eastus2"
}
# define the app
resource "azuread_application" "tfapp" {
name = "mytfapp"
}
# define the service principal
resource "azuread_service_principal" "tfapp" {
application_id = azuread_application.tfapp.application_id
}
# define the service principal password
resource "azuread_service_principal_password" "tfapp" {
service_principal_id = azuread_service_principal.tfapp.id
end_date = "2020-12-31T09:00:00Z"
value = random_string.password.result
}
# define the container registry
resource "azurerm_container_registry" "acr" {
name = "mycontainerregistry2387987222"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku = "Basic"
admin_enabled = false
}
# define the kubernetes cluster
resource "azurerm_kubernetes_cluster" "mycluster" {
name = "myaks"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = "mycluster"
network_profile {
network_plugin = "azure"
}
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_B2s"
}
# Use the service principal created above
service_principal {
client_id = azuread_service_principal.tfapp.application_id
client_secret = azuread_service_principal_password.tfapp.value
}
tags = {
Environment = "demo"
}
windows_profile {
admin_username = "dingding"
admin_password = random_string.password.result
}
}
# define the windows node pool for kubernetes
resource "azurerm_kubernetes_cluster_node_pool" "winpool" {
name = "winp"
kubernetes_cluster_id = azurerm_kubernetes_cluster.mycluster.id
vm_size = "Standard_B2s"
node_count = 1
os_type = "Windows"
}
# define the kubernetes name space
resource "kubernetes_namespace" "namesp" {
metadata {
name = "namesp"
}
}
# Try to give permissions, to let the AKR access the ACR
resource "azurerm_role_assignment" "acrpull_role" {
scope = azurerm_container_registry.acr.id
role_definition_name = "AcrPull"
principal_id = azuread_service_principal.tfapp.object_id
skip_service_principal_aad_check = true
}
此代码改编自https://github.com/terraform-providers/terraform-provider-azuread/issues/104。
不幸的是,当我在 kubernetes 集群中启动容器时,我收到一条错误消息:
Failed to pull image "mycontainerregistry.azurecr.io/myunittests": [rpc error: code = Unknown desc = Error response from daemon: manifest for mycontainerregistry.azurecr.io/myunittests:latest not found: manifest unknown: manifest unknown, rpc error: code = Unknown desc = Error response from daemon: Get https://mycontainerregistry.azurecr.io/v2/myunittests/manifests/latest: unauthorized: authentication required]
更新/备注:
当我运行terraform apply
用上面的代码时,创建资源被中断:
azurerm_container_registry.acr: Creation complete after 18s [id=/subscriptions/000/resourceGroups/myrg/providers/Microsoft.ContainerRegistry/registries/mycontainerregistry2387987222]
azurerm_role_assignment.acrpull_role: Creating...
azuread_service_principal_password.tfapp: Still creating... [10s elapsed]
azuread_service_principal_password.tfapp: Creation complete after 12s [id=000/000]
azurerm_kubernetes_cluster.mycluster: Creating...
azurerm_role_assignment.acrpull_role: Creation complete after 8s [id=/subscriptions/000/resourceGroups/myrg/providers/Microsoft.ContainerRegistry/registries/mycontainerregistry2387987222/providers/Microsoft.Authorization/roleAssignments/000]
azurerm_kubernetes_cluster.mycluster: Still creating... [10s elapsed]
Error: Error creating Managed Kubernetes Cluster "myaks" (Resource Group "myrg"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="ServicePrincipalNotFound" Message="Service principal clientID: 000 not found in Active Directory tenant 000, Please see https://aka.ms/aks-sp-help for more details."
on test.tf line 56, in resource "azurerm_kubernetes_cluster" "mycluster":
56: resource "azurerm_kubernetes_cluster" "mycluster" {
不过,我认为这只是因为创建服务主体需要几分钟时间。几分钟后,当我再次 运行 terraform apply
时,它毫无问题地超越了那个点。
这段代码对我有用。
resource "azuread_application" "aks_sp" {
name = "sp-aks-${local.cluster_name}"
}
resource "azuread_service_principal" "aks_sp" {
application_id = azuread_application.aks_sp.application_id
app_role_assignment_required = false
}
resource "azuread_service_principal_password" "aks_sp" {
service_principal_id = azuread_service_principal.aks_sp.id
value = random_string.aks_sp_password.result
end_date_relative = "8760h" # 1 year
lifecycle {
ignore_changes = [
value,
end_date_relative
]
}
}
resource "azuread_application_password" "aks_sp" {
application_object_id = azuread_application.aks_sp.id
value = random_string.aks_sp_secret.result
end_date_relative = "8760h" # 1 year
lifecycle {
ignore_changes = [
value,
end_date_relative
]
}
}
data "azurerm_container_registry" "pyp" {
name = var.container_registry_name
resource_group_name = var.container_registry_resource_group_name
}
resource "azurerm_role_assignment" "aks_sp_container_registry" {
scope = data.azurerm_container_registry.pyp.id
role_definition_name = "AcrPull"
principal_id = azuread_service_principal.aks_sp.object_id
}
# requires Azure Provider 1.37+
resource "azurerm_kubernetes_cluster" "pyp" {
name = local.cluster_name
location = azurerm_resource_group.pyp.location
resource_group_name = azurerm_resource_group.pyp.name
dns_prefix = local.env_name_nosymbols
kubernetes_version = local.kubernetes_version
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_D2s_v3"
os_disk_size_gb = 80
}
windows_profile {
admin_username = "winadm"
admin_password = random_string.windows_profile_password.result
}
network_profile {
network_plugin = "azure"
dns_service_ip = cidrhost(local.service_cidr, 10)
docker_bridge_cidr = "172.17.0.1/16"
service_cidr = local.service_cidr
load_balancer_sku = "standard"
}
service_principal {
client_id = azuread_service_principal.aks_sp.application_id
client_secret = random_string.aks_sp_password.result
}
addon_profile {
oms_agent {
enabled = true
log_analytics_workspace_id = azurerm_log_analytics_workspace.pyp.id
}
}
tags = local.tags
}
来源https://github.com/giuliov/pipeline-your-pipelines/tree/master/src/kubernetes/terraform
(上面的答案是我做的)
只需添加一种更简单的方法,您无需为可能需要它的任何其他人创建服务主体。
resource "azurerm_kubernetes_cluster" "kubweb" {
name = local.cluster_web
location = local.rgloc
resource_group_name = local.rgname
dns_prefix = "${local.cluster_web}-dns"
kubernetes_version = local.kubversion
# used to group all the internal objects of this cluster
node_resource_group = "${local.cluster_web}-rg-node"
# azure will assign the id automatically
identity {
type = "SystemAssigned"
}
default_node_pool {
name = "nodepool1"
node_count = 4
vm_size = local.vm_size
orchestrator_version = local.kubversion
}
role_based_access_control {
enabled = true
}
addon_profile {
kube_dashboard {
enabled = true
}
}
tags = {
environment = local.env
}
}
resource "azurerm_container_registry" "acr" {
name = "acr1"
resource_group_name = local.rgname
location = local.rgloc
sku = "Standard"
admin_enabled = true
tags = {
environment = local.env
}
}
# add the role to the identity the kubernetes cluster was assigned
resource "azurerm_role_assignment" "kubweb_to_acr" {
scope = azurerm_container_registry.acr.id
role_definition_name = "AcrPull"
principal_id = azurerm_kubernetes_cluster.kubweb.kubelet_identity[0].object_id
}
只是想更深入地了解一下,因为这是我一直在努力解决的问题 as-well。
推荐的方法是使用托管标识而不是服务主体,因为开销较小。
创建容器注册表:
resource "azurerm_container_registry" "acr" {
name = "acr"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku = "Standard"
admin_enabled = false
}
创建一个 AKS 集群,下面的代码创建具有 2 个身份的 AKS 集群:
- A System Assigned Identity 分配给 Control Plane。
- A User Assigned Managed Identity 也会自动创建并分配给 Kubelet,注意我没有具体的代码,因为它是自动发生的。
Kubelet 是去 Container Registry 拉取镜像的进程,因此我们需要确保这个 User Assigned Managed Identity 在 Container Registry 上有 AcrPull Role。
resource "azurerm_kubernetes_cluster" "aks" {
name = "aks"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
dns_prefix = "aks"
node_resource_group = "aks-node"
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_Ds2_v2"
enable_auto_scaling = false
type = "VirtualMachineScaleSets"
vnet_subnet_id = azurerm_subnet.aks_subnet.id
max_pods = 50
}
network_profile {
network_plugin = "azure"
load_balancer_sku = "Standard"
}
identity {
type = "SystemAssigned"
}
}
创建上述角色分配以允许用户分配的托管标识从 Container Registry 中提取。
resource "azurerm_role_assignment" "ra" {
principal_id = azurerm_kubernetes_cluster.aks.kubelet_identity[0].object_id
role_definition_name = "AcrPull"
scope = azurerm_container_registry.acr.id
skip_service_principal_aad_check = true
}
希望这能为您解决问题,因为我在 Internet 上看到一些关于创建的两个身份的混淆。
Azure 容器注册表资源的 Terraform 文档现在对此进行跟踪,应该始终保持最新。
resource "azurerm_resource_group" "example" {
name = "example-resources"
location = "West Europe"
}
resource "azurerm_container_registry" "example" {
name = "containerRegistry1"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
}
resource "azurerm_kubernetes_cluster" "example" {
name = "example-aks1"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
dns_prefix = "exampleaks1"
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_D2_v2"
}
identity {
type = "SystemAssigned"
}
tags = {
Environment = "Production"
}
}
resource "azurerm_role_assignment" "example" {
principal_id = azurerm_kubernetes_cluster.example.kubelet_identity[0].object_id
role_definition_name = "AcrPull"
scope = azurerm_container_registry.example.id
skip_service_principal_aad_check = true
}