在 Dataflow worker 上使用 SSH Key 拉取私有库

Using SSH Key on Dataflow workers to pull private library

我正在设置数据流作业,对于此作业,工作人员需要访问私有 bitbucket 存储库以安装库来处理数据。为了向数据流工作者授予访问权限,我设置了一对 SSH 密钥(public & private)。我设法将私钥获取到我的数据流工作者。尝试通过 git+ssh pip 安装软件包时出现错误 Host key verification failed.

我试图在数据流工作器上查找 .ssh/known_hosts 文件,但这在常规 VM 上并不那么直接。

或者,我已经通过以下命令自行设置了它,但效果不佳:

mkdir -p ~/.ssh
chmod 0700 ~/.ssh
ssh-keyscan bitbucket.org > ~/.ssh/known_hosts

我仍然收到 Host key verification failed 错误。

针对此问题的另一种建议修复方法是 运行 ssh-keygen -R bitbucket.org 但随后出现以下错误: mkstemp: No such file or directory

对于 Dataflow Python SDK,您需要使用 setup.py 打包您的代码。所有worker启动时执行的命令都写成subprocess.Popen。命令列表如下:

CUSTOM_COMMANDS = [
    # decrypt key encrypted key in repository via gcloud kms
    ['gcloud', '-v'],
    ['gcloud', 'kms', 'decrypt', '--location', 'global', '--keyring',
     'bitbucketpackages', '--key', 'package', '--plaintext-file',
     'bb_package_key_decrypted', '--ciphertext-file', 'bb_package_key'],
    ['chmod', '700', 'bb_package_key_decrypted'],
    # install git & ssh
    ['apt-get', 'update'],
    ['apt-get', 'install', '-y', 'openssh-server'],
    ['apt-get', 'install', '-y', 'git'],
    # add bitbucket.org as known host
    ['mkdir', '-p', '~/.ssh'],
    ['chmod', '0700', '~/.ssh'],
    ['ssh-keyscan', 'bitbucket.org', '>', '~/.ssh/known_hosts'],
    # other attempts to fix it
    # ['ssh-keygen', '-R', 'bitbucket.org']
    # pip install
    ['sh', '-c', 'GIT_SSH_COMMAND="ssh -i ./bb_package_key_decrypted" pip install git+ssh://git@bitbucket.org/team/repo.git'],
] 

尝试更新 ssh-keyscan 以写入某个临时路径,然后将已知主机文件位置作为 GIT_SSH_COMMAND 的一部分传递。例如,我会将您的脚本更新为:

CUSTOM_COMMANDS = [
    # decrypt key encrypted key in repository via gcloud kms
    ['gcloud', '-v'],
    ['gcloud', 'kms', 'decrypt', '--location', 'global', '--keyring',
     'bitbucketpackages', '--key', 'package', '--plaintext-file',
     'bb_package_key_decrypted', '--ciphertext-file', 'bb_package_key'],
    ['chmod', '700', 'bb_package_key_decrypted'],
    # install git & ssh
    ['apt-get', 'update'],
    ['apt-get', 'install', '-y', 'openssh-server'],
    ['apt-get', 'install', '-y', 'git'],
    # add bitbucket.org as known host
    ['mkdir', '-p', '~/.ssh'],
    ['chmod', '0700', '~/.ssh'],
    ['ssh-keyscan', 'bitbucket.org', '>', '/tmp/bit_bucket_known_hosts'],
    # other attempts to fix it
    # ['ssh-keygen', '-R', 'bitbucket.org']
    # pip install
    ['sh', '-c', 'GIT_SSH_COMMAND="ssh -o UserKnownHostsFile=/tmp/bit_bucket_known_hosts -i ./bb_package_key_decrypted" pip install git+ssh://git@bitbucket.org/team/repo.git'],
]