有没有办法从 CLI/API 获取传递给 GCP 数据流作业的参数

Is there a way to get the parameters that were passed to a GCP Dataflow job from the CLI/API

我已经尝试了 describe 列出的 here 命令,但我没有看到参数。我应该使用另一个命令来获取此信息,还是可以提供此信息的其他 API?

TL;DR - 您缺少 gcloud dataflow jobs describe 命令的 --full 参数。

FLAGS

--full

Retrieve the full Job rather than the summary view

查看完整的职位信息

如果您使用 gcloud 查看有关 GCP 数据流作业的信息,此命令将显示有关该作业的完整信息(实际上是相当多的信息),包括任何参数传递给作业:

gcloud dataflow jobs describe JOB_ID --full

所有选项都在层级environment.sdkPipelineOptions.options

查看所有选项 JSON

要以 JSON 的形式查看传递给作业的所有选项(实际上打印的不仅仅是命令行参数 BTW),您可以执行以下操作:

$ gcloud dataflow jobs describe JOB_ID --full --format='json(environment.sdkPipelineOptions.options)'
{
  "environment": {
    "sdkPipelineOptions": {
      "options": {
        "apiRootUrl": "https://dataflow.googleapis.com/",
        "appName": "WordCount",
        "credentialFactoryClass": "com.google.cloud.dataflow.sdk.util.GcpCredentialFactory",
        "dataflowEndpoint": "",
        "enableCloudDebugger": false,
        "enableProfilingAgent": false,
        "firstArg": "foo",
        "inputFile": "gs://dataflow-samples/shakespeare/kinglear.txt",
        "jobName": "wordcount-tuxdude-12345678",
        "numberOfWorkerHarnessThreads": 0,
        "output": "gs://BUCKET_NAME/dataflow/output",
        "pathValidatorClass": "com.google.cloud.dataflow.sdk.util.DataflowPathValidator",
        "project": "PROJECT_NAME",
        "runner": "com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner",
        "secondArg": "bar",
        "stableUniqueNames": "WARNING",
        "stagerClass": "com.google.cloud.dataflow.sdk.util.GcsStager",
        "stagingLocation": "gs://BUCKET_NAME/dataflow/staging/",
        "streaming": false,
        "tempLocation": "gs://BUCKET_NAME/dataflow/staging/"
      }
    }
  }
}

查看所有选项 table

$ gcloud dataflow jobs describe JOB_ID --full --format='flattened(environment.sdkPipelineOptions.options)'
environment.sdkPipelineOptions.options.apiRootUrl:                   https://dataflow.googleapis.com/
environment.sdkPipelineOptions.options.appName:                      WordCount
environment.sdkPipelineOptions.options.credentialFactoryClass:       com.google.cloud.dataflow.sdk.util.GcpCredentialFactory
environment.sdkPipelineOptions.options.dataflowEndpoint:
environment.sdkPipelineOptions.options.enableCloudDebugger:          False
environment.sdkPipelineOptions.options.enableProfilingAgent:         False
environment.sdkPipelineOptions.options.firstArg:                     foo
environment.sdkPipelineOptions.options.inputFile:                    gs://dataflow-samples/shakespeare/kinglear.txt
environment.sdkPipelineOptions.options.jobName:                      wordcount-tuxdude-12345678
environment.sdkPipelineOptions.options.numberOfWorkerHarnessThreads: 0
environment.sdkPipelineOptions.options.output:                       gs://BUCKET_NAME/dataflow/output
environment.sdkPipelineOptions.options.pathValidatorClass:           com.google.cloud.dataflow.sdk.util.DataflowPathValidator
environment.sdkPipelineOptions.options.project:                      PROJECT_NAME
environment.sdkPipelineOptions.options.runner:                       com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner
environment.sdkPipelineOptions.options.secondArg:                    bar
environment.sdkPipelineOptions.options.stableUniqueNames:            WARNING
environment.sdkPipelineOptions.options.stagerClass:                  com.google.cloud.dataflow.sdk.util.GcsStager
environment.sdkPipelineOptions.options.stagingLocation:              gs://BUCKET_NAME/dataflow/staging/
environment.sdkPipelineOptions.options.streaming:                    False
environment.sdkPipelineOptions.options.tempLocation:                 gs://BUCKET_NAME/dataflow/staging/

获取单个选项的值

要仅获取名为 --argName 的单个选项的值(其值顺便说一句是 MY_ARG_VALUE),您可以执行以下操作:

$ gcloud dataflow jobs describe JOB_ID --full --format='value(environment.sdkPipelineOptions.options.argName)'
MY_ARG_VALUE

gcloud 格式化

gcloud 通常在输出中支持范围广泛的格式选项,适用于大多数从服务器提取信息的 gcloud 命令。您可以阅读有关它们的信息 here