Terraform Azure Databricks 提供程序错误

Terraform Azure Databricks Provider Error

我需要一些帮助来理解登录 Databricks 的各种形式。我正在使用 Terraform 预配 Azure Databricks 我想知道下面两个代码的区别 当我使用选项 1 时,出现如图所示的错误

选项 1:

  required_providers {
    azuread     = "~> 1.0"
    azurerm     = "~> 2.0"
    azuredevops = { source = "registry.terraform.io/microsoft/azuredevops", version = "~> 0.0" }
    databricks  = { source = "registry.terraform.io/databrickslabs/databricks", version = "~> 0.0" }
  }
}

provider "random" {}
provider "azuread" {
  tenant_id     = var.project.arm.tenant.id
  client_id     = var.project.arm.client.id
  client_secret = var.secret.arm.client.secret
}

provider "databricks" {
  host          = azurerm_databricks_workspace.db-workspace.workspace_url
  azure_use_msi = true
}

resource "azurerm_databricks_workspace" "db-workspace" {
  name                          = module.names-db-workspace.environment.databricks_workspace.name_unique
  resource_group_name           = module.resourcegroup.resource_group.name
  location                      = module.resourcegroup.resource_group.location
  sku                           = "premium"
  public_network_access_enabled = true

  custom_parameters {
    no_public_ip                                         = true
    virtual_network_id                                   = module.virtualnetwork["centralus"].virtual_network.self.id
    public_subnet_name                                   = module.virtualnetwork["centralus"].virtual_network.subnets["db-sub-1-public"].name
    private_subnet_name                                  = module.virtualnetwork["centralus"].virtual_network.subnets["db-sub-2-private"].name
    public_subnet_network_security_group_association_id  = module.virtualnetwork["centralus"].virtual_network.nsgs.associations.subnets["databricks-public-nsg-db-sub-1-public"].id
    private_subnet_network_security_group_association_id = module.virtualnetwork["centralus"].virtual_network.nsgs.associations.subnets["databricks-private-nsg-db-sub-2-private"].id
  }
  tags = local.tags
}

Databricks 集群创建

resource "databricks_cluster" "dbcselfservice" {
  cluster_name            = format("adb-cluster-%s-%s", var.project.name, var.project.environment.name)
  spark_version           = var.spark_version
  node_type_id            = var.node_type_id
  autotermination_minutes = 20
  autoscale {
    min_workers = 1
    max_workers = 7
  }
  azure_attributes {
    availability       = "SPOT_AZURE"
    first_on_demand    = 1
    spot_bid_max_price = 100
  }
  depends_on = [
    azurerm_databricks_workspace.db-workspace
  ]
}

Databricks 工作区 RBAC 权限

resource "databricks_group" "db-group" {
  display_name               = format("adb-users-%s", var.project.name)
  allow_cluster_create       = true
  allow_instance_pool_create = true
  depends_on = [
    resource.azurerm_databricks_workspace.db-workspace
  ]
}

resource "databricks_user" "dbuser" {
  count            = length(local.display_name)
  display_name     = local.display_name[count.index]
  user_name        = local.user_name[count.index]
  workspace_access = true
  depends_on = [
    resource.azurerm_databricks_workspace.db-workspace
  ]
}

将成员添加到 Databricks 管理员组

resource "databricks_group_member" "i-am-admin" {
  for_each  = toset(local.email_address)
  group_id  = data.databricks_group.admins.id
  member_id = databricks_user.dbuser[index(local.email_address, each.key)].id
  depends_on = [
    resource.azurerm_databricks_workspace.db-workspace
  ]
}

data "databricks_group" "admins" {
  display_name = "admins"
  depends_on = [
    #    resource.databricks_cluster.dbcselfservice,
    resource.azurerm_databricks_workspace.db-workspace
  ]
}

我在申请 TF 时遇到的错误如下:

Error: User not authorized

with databricks_user.dbuser[1],
on resources.adb.tf line 80, in resource "databricks_user" "dbuser":
80: resource "databricks_user" "dbuser"{


Error: User not authorized

with databricks_user.dbuser[0],
on resources.adb.tf line 80, in resource "databricks_user" "dbuser":
80: resource "databricks_user" "dbuser"{

Error: cannot refresh AAD token: adal:Refresh request failed. Status Code =  '500'. Response body: {"error":"server_error", "error_description":"Internal server error"} Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.core.windows.net%2F

with databricks_group.db-group,
on resources.adb.tf line 80, in resource "databricks_group" "db-group":
71: resource "databricks_group" "db-group"{

错误是因为下面这个块吗?

provider "databricks" {
  host          = azurerm_databricks_workspace.db-workspace.workspace_url
  azure_use_msi = true
}

我只需要在单击门户中的 URL 时自动登录。那我该用什么?为什么我们需要提供两次数据块提供程序,一次在 required_providers 下,一次在提供程序“databricks”中? 我已经看到如果我不提供第二个提供者我会收到错误消息:

"authentication is not configured for provider"

azure_use_msi 选项主要用于在 managed identity assigned to them. All possible authentication options are described in the documenation, but simplest way is to use authentication via Azure CLI, so you just need to leave host parameter in the provider block. If you don't have Azure CLI on that machine, you can use combination of host + personal access token 机器上执行的 CI/CD 管道。

如果您 运行 来自机器的代码分配了托管身份,那么您需要确保此身份已添加到工作区中,或者它具有 Contributor 访问权限 - 请参阅 Azure Databricks documentation了解更多详情。

如评论中所述,如果您使用的是 Azure CLI 身份验证,即 az login 使用您的用户名和密码,那么您可以使用以下代码:

terraform {
  required_providers {
    databricks = {
      source = "databrickslabs/databricks"
      version = "0.3.11"
    }
  }
}
provider "azurerm" {
  features {}
}
provider "databricks" {
    host = azurerm_databricks_workspace.example.workspace_url
}

resource "azurerm_databricks_workspace" "example" {
  name                        = "DBW-ansuman"
  resource_group_name         = azurerm_resource_group.example.name
  location                    = azurerm_resource_group.example.location
  sku                         = "premium"
  managed_resource_group_name = "ansuman-DBW-managed-without-lb"

  public_network_access_enabled = true

  custom_parameters {
    no_public_ip        = true
    public_subnet_name  = azurerm_subnet.public.name
    private_subnet_name = azurerm_subnet.private.name
    virtual_network_id  = azurerm_virtual_network.example.id

    public_subnet_network_security_group_association_id  = azurerm_subnet_network_security_group_association.public.id
    private_subnet_network_security_group_association_id = azurerm_subnet_network_security_group_association.private.id
  }

  tags = {
    Environment = "Production"
    Pricing     = "Standard"
  }
}
data "databricks_node_type" "smallest" {
  local_disk = true
    depends_on = [
    azurerm_databricks_workspace.example
  ]
}
data "databricks_spark_version" "latest_lts" {
  long_term_support = true
    depends_on = [
    azurerm_databricks_workspace.example
  ]
}
resource "databricks_cluster" "dbcselfservice" {
  cluster_name            = "Shared Autoscaling"
  spark_version           = data.databricks_spark_version.latest_lts.id
  node_type_id            = data.databricks_node_type.smallest.id
  autotermination_minutes = 20
  autoscale {
    min_workers = 1
    max_workers = 7
  }
  azure_attributes {
    availability       = "SPOT_AZURE"
    first_on_demand    = 1
    spot_bid_max_price = 100
  }
  depends_on = [
    azurerm_databricks_workspace.example
  ]
}
resource "databricks_group" "db-group" {
  display_name               = "adb-users-admin"
  allow_cluster_create       = true
  allow_instance_pool_create = true
  depends_on = [
    resource.azurerm_databricks_workspace.example
  ]
}

resource "databricks_user" "dbuser" {
  display_name     = "Rahul Sharma"
  user_name        = "example@contoso.com"
  workspace_access = true
  depends_on = [
    resource.azurerm_databricks_workspace.example
  ]
}
resource "databricks_group_member" "i-am-admin" {
  group_id  = databricks_group.db-group.id
  member_id = databricks_user.dbuser.id
  depends_on = [
    resource.azurerm_databricks_workspace.example
  ]
}

输出:


如果您使用 Service Principal 作为身份验证,那么您可以使用如下内容:

terraform {
  required_providers {
    databricks = {
      source = "databrickslabs/databricks"
      version = "0.3.11"
    }
  }
}
provider "azurerm" {
  subscription_id = "948d4068-xxxx-xxxx-xxxx-e00a844e059b"
  tenant_id = "72f988bf-xxxx-xxxx-xxxx-2d7cd011db47"
  client_id = "f6a2f33d-xxxx-xxxx-xxxx-d713a1bb37c0"
  client_secret = "inl7Q~Gvdxxxx-xxxx-xxxxyaGPF3uSoL"
  features {}
}
provider "databricks" {
    host = azurerm_databricks_workspace.example.workspace_url
    azure_client_id = "f6a2f33d-xxxx-xxxx-xxxx-d713a1bb37c0"
    azure_client_secret = "inl7Q~xxxx-xxxx-xxxxg6ntiyaGPF3uSoL"
    azure_tenant_id = "72f988bf-xxxx-xxxx-xxxx-2d7cd011db47"
}


resource "azurerm_databricks_workspace" "example" {
  name                        = "DBW-ansuman"
  resource_group_name         = azurerm_resource_group.example.name
  location                    = azurerm_resource_group.example.location
  sku                         = "premium"
  managed_resource_group_name = "ansuman-DBW-managed-without-lb"

  public_network_access_enabled = true

  custom_parameters {
    no_public_ip        = true
    public_subnet_name  = azurerm_subnet.public.name
    private_subnet_name = azurerm_subnet.private.name
    virtual_network_id  = azurerm_virtual_network.example.id

    public_subnet_network_security_group_association_id  = azurerm_subnet_network_security_group_association.public.id
    private_subnet_network_security_group_association_id = azurerm_subnet_network_security_group_association.private.id
  }

  tags = {
    Environment = "Production"
    Pricing     = "Standard"
  }
}
data "databricks_node_type" "smallest" {
  local_disk = true
    depends_on = [
    azurerm_databricks_workspace.example
  ]
}
data "databricks_spark_version" "latest_lts" {
  long_term_support = true
    depends_on = [
    azurerm_databricks_workspace.example
  ]
}
resource "databricks_cluster" "dbcselfservice" {
  cluster_name            = "Shared Autoscaling"
  spark_version           = data.databricks_spark_version.latest_lts.id
  node_type_id            = data.databricks_node_type.smallest.id
  autotermination_minutes = 20
  autoscale {
    min_workers = 1
    max_workers = 7
  }
  azure_attributes {
    availability       = "SPOT_AZURE"
    first_on_demand    = 1
    spot_bid_max_price = 100
  }
  depends_on = [
    azurerm_databricks_workspace.example
  ]
}
resource "databricks_group" "db-group" {
  display_name               = "adb-users-admin"
  allow_cluster_create       = true
  allow_instance_pool_create = true
  depends_on = [
    resource.azurerm_databricks_workspace.example
  ]
}

resource "databricks_user" "dbuser" {
  display_name     = "Rahul Sharma"
  user_name        = "example@contoso.com"
  workspace_access = true
  depends_on = [
    resource.azurerm_databricks_workspace.example
  ]
}
resource "databricks_group_member" "i-am-admin" {
  group_id  = databricks_group.db-group.id
  member_id = databricks_user.dbuser.id
  depends_on = [
    resource.azurerm_databricks_workspace.example
  ]
}

And why do we need to provide two times databricks providers, once under required_providers and again in provider "databricks"?

required_providers 用于从源下载和初始化所需的提供程序,即 Terraform Registry 。但是 Provider Block 用于进一步配置下载的提供程序,如描述 client_id、功能块等,可用于身份验证或其他配置。