通过使用 Github 操作连接到 Databricks CLI 来更新 Databricks Workspace Repo

Update Databricks Workspace Repo by Connecting to Databricks CLI with Github Actions

我试图在每次向回购进行新推送时自动将最新版本的 GitHub 回购拉入我的 Databricks 工作区。一切正常,直到 Databricks CLI 请求主机 URL 之后它失败并显示“错误:进程已完成,退出代码为 1”。我假设这是我的令牌和主机凭据作为秘密存储的问题,无法正确加载到环境中。根据 Databricks,“CLI 0.8.0 及更高版本支持以下环境变量:DATABRICKS_HOST、DATABRICKS_USERNAME、DATABRICKS_PASSWORD、DATABRICKS_TOKEN”。我添加了 DATABRICKS_HOST 和 DATABRICKS_TOKEN 作为存储库机密,所以我不确定我做错了什么。

on:
 push:

jobs:
 build:
  runs-on: ubuntu-latest

  steps:

    - name: setup python
      uses: actions/setup-python@v2
      with:
        python-version: 3.8 #install the python version needed

    - name: execute py
      env:
        DATABRICKS_HOST: $(DATABRICKS_HOST)
        DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
      run: |
        python -m pip install --upgrade databricks-cli
        databricks configure --token
        databricks repos update --repo-id REPOID-ENTERED --branch "Development"

错误:

Successfully built databricks-cli
Installing collected packages: tabulate, certifi, urllib3, six, pyjwt, oauthlib, idna, click, charset-normalizer, requests, databricks-cli
Successfully installed certifi-2021.10.8 charset-normalizer-2.0.12 click-8.1.3 databricks-cli-0.16.6 idna-3.3 oauthlib-3.2.0 pyjwt-2.4.0 requests-2.27.1 six-1.16.0 tabulate-0.8.9 urllib3-1.26.9
WARNING: You are using pip version 22.0.4; however, version 22.1 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.8.12/x64/bin/python -m pip install --upgrade pip' command.
Aborted!
Databricks Host (should begin with https://): 
Error: Process completed with exit code 1.

只需从您的命令中删除 databricks configure --token - 这不是必需的。在这种情况下,Databricks CLI 将使用环境变量。查看 Azure DevOps 的工作管道 here

我认为不使用客户端直接调用 api 效果最好。下面是来自 azure devops 的代码。也应该适用于 github 操作。

      import requests
      import sys
      from adal import AuthenticationContext

      user_parameters = {
          "tenant" : "$(SP_TENANT_ID)",
            "client_id" : "$(SP-CLIENT-ID)", 
            "redirect_uri" : "http://localhost",
            "client_secret": "$(SP-CLIENT-SECRET)"   
      }
      
      authority_host_url = "https://login.microsoftonline.com/"
      azure_databricks_resource_id = "put_here"
      authority_url = authority_host_url + user_parameters['tenant']
      
      # supply the refresh_token (whose default lifetime is 90 days or longer [token lifetime])
      def refresh_access_token(refresh_token):
        context = AuthenticationContext(authority_url)
        # function link
        token_response = context.acquire_token_with_refresh_token(
                        refresh_token,
                        user_parameters['client_id'],
                        azure_databricks_resource_id,
                        user_parameters['client_secret'])
        
        # the new 'refreshToken' and  'accessToken' will be returned
        return (token_response['refreshToken'], token_response['accessToken'])
      
      (refresh_token, access_token) = refresh_access_token("$(AAD-REFRESH-TOKEN)")
      print('##vso[task.setvariable variable=ACCESS_TOKEN;]%s' % (access_token))
- bash: |
    # Write your commands here
    
    echo 'Patching Repo $(DB_WORKSPACE_HOST/$(REPO_ID)'
    # Update the repo to the given tag
    
    echo 'https://$(DB_WORKSPACE_HOST)/api/2.0/repos/$(REPO_ID) $(Build.SourceBranchName)'
    
    curl -n -X PATCH -o "/tmp/db_patch-out.json" https://$(DB_WORKSPACE_HOST)/api/2.0/repos/$(REPO_ID) \
        -H 'Authorization: Bearer $(ACCESS_TOKEN)' \
        -d '{"branch": "$(Build.SourceBranchName)"}'
    cat "/tmp/db_patch-out.json"
    grep -v error_code "/tmp/db_patch-out.json"
  displayName: 'Update DataBricks Repo'

如果您的 git 供应商与数据块建立了网络连接,则此方法有效。如果你在同一个网络上有 adf 但没有网络连接,你可以 1) 启动一个 api 网关来保护和桥接你的网络调用,或者 2) 你可以对 adf 执行 asch 触发器并让它调用 databricks通过将文件放入 Azure 存储 https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger?tabs=data-factory。或发送电子邮件或其他事件触发器。