GCP 数据流提取 JOB_ID

GCP Dataflow extract JOB_ID

对于数据流作业,我需要从 JOB_NAME 中提取 Job_ID。我有以下命令和相应的 o/p。您能否指导如何从以下响应中提取 JOB_ID

$ gcloud dataflow jobs list --region=us-central1 --status=active --filter="name=sample-job"
JOB_ID                                    NAME                        TYPE       CREATION_TIME        STATE    REGION
2020-10-07_10_11_20-15879763245819496196  sample-job  Streaming  2020-10-07 17:11:21  Running  us-central1

要是能用Python脚本来实现就可以了

您可以使用标准命令行工具来解析该命令的响应,例如

gcloud dataflow jobs list --region=us-central1 --status=active --filter="name=sample-job" | tail -n 1 | cut -f 1 -d " "

或者,如果这已经来自 Python 程序,您可以直接使用数据流 API 而不是使用 gcloud 工具,如

使用 python,您可以通过对数据流方法 https://dataflow.googleapis.com/v1b3/projects/{projectId}/jobs

的 REST 请求检索 jobs' list

然后,可以解析 json 响应以使用 if 子句过滤您正在搜索的职位名称:

if job["name"] == 'sample-job'

我测试了这个方法并且有效:

import requests   
import json

base_url = 'https://dataflow.googleapis.com/v1b3/projects/'
project_id = '<MY_PROJECT_ID>'
location = '<REGION>'

response = requests.get(f'{base_url}{project_id}/locations/{location}/jobs', headers = {'Authorization':'Bearer <BEARER_TOKEN_HERE>'})
# <BEARER_TOKEN_HERE> can be retrieved with 'gcloud auth print-access-token' obtained with an account that has access to Dataflow jobs. 
# Another authentication mechanism can be found in the link provided by danielm

jobslist = response.json()

for key,jobs in jobslist.items():
 for job in jobs:
  if job["name"] == 'beamapp-0907191546-413196':
   print(job["name"]," Found, job ID:",job["id"])
  else:
   print(job["name"]," Not matched")
   
# Output:
# windowedwordcount-0908012420-bd342f98  Not matched
# beamapp-0907200305-106040  Not matched
# beamapp-0907192915-394932  Not matched
# beamapp-0907191546-413196  Found, job ID: 2020-09-07...154989572

Python script 创建了我的 GIST 来实现它。

gcloud dataflow jobs list --region=us-central1 --status=active --filter="name=sample-job" --format="value(JOB_ID)"