无法参数化 placement.managedCluster.config 下的任何值
Cannot parametrize any value under placement.managedCluster.config
我的目标是从 python 代码创建数据处理工作流模板。同时,我希望能够在模板实例化期间参数化 placement.managedCluster.config.gceClusterConfig.subnetworkUri 字段。
我从 json 文件中读取模板,例如:
{
"id": "bigquery-extractor",
"placement": {
"managed_cluster": {
"config": {
"gce_cluster_config": {
"subnetwork_uri": "some-subnet-name"
},
"software_config" : {
"image_version": "1.5"
}
},
"cluster_name": "some-name"
}
},
"jobs": [
{
"pyspark_job": {
"args": [
"job_argument"
],
"main_python_file_uri": "gs:///path-to-file"
},
"step_id": "extract"
}
],
"parameters": [
{
"name": "CLUSTER_NAME",
"fields": [
"placement.managedCluster.clusterName"
]
},
{
"name": "SUBNETWORK_URI",
"fields": [
"placement.managedCluster.config.gceClusterConfig.subnetworkUri"
]
},
{
"name": "MAIN_PY_FILE",
"fields": [
"jobs['extract'].pysparkJob.mainPythonFileUri"
]
},
{
"name": "JOB_ARGUMENT",
"fields": [
"jobs['extract'].pysparkJob.args[0]"
]
}
]
}
我使用的代码片段:
options = ClientOptions(api_endpoint="{}-dataproc.googleapis.com:443".format(region))
client = dataproc.WorkflowTemplateServiceClient(client_options=options)
template_file = open(path_to_file, "r")
template_dict = eval(template_file.read())
print(template_dict)
template = dataproc.WorkflowTemplate(template_dict)
full_region_id = "projects/{project_id}/regions/{region}".format(project_id=project_id, region=region)
try:
client.create_workflow_template(
parent=full_region_id,
template=template
)
except AlreadyExists as err:
print(err)
pass
当我尝试 运行 此代码时,出现以下错误:
google.api_core.exceptions.InvalidArgument: 400 Invalid field path placement.managed_cluster.configuration.gce_cluster_config.subnetwork_uri: Field gce_cluster_config does not exist.
如果我尝试参数化 placement.managedCluster.config.softwareConfig.imageVersion
,此行为也是相同的,我将得到
google.api_core.exceptions.InvalidArgument: 400 Invalid field path placement.managed_cluster.configuration.software_config.image_version: Field software_config does not exist.
但是如果我从 parameters
映射中排除 placement.managedCluster.config
下的任何字段,则模板创建成功。
我没有发现对这些字段进行参数化有任何限制。有没有?还是我做错了什么?
这 doc 列出了可参数化的字段。似乎只有 managedCluster
的 managedCluster.name
是可参数化的:
Managed cluster name. Dataproc will use the user-supplied name as the name prefix, and append random characters to create a unique cluster name. The cluster is deleted at the end of the workflow.
我没有看到 managedCluster.config
可参数化。
我的目标是从 python 代码创建数据处理工作流模板。同时,我希望能够在模板实例化期间参数化 placement.managedCluster.config.gceClusterConfig.subnetworkUri 字段。
我从 json 文件中读取模板,例如:
{
"id": "bigquery-extractor",
"placement": {
"managed_cluster": {
"config": {
"gce_cluster_config": {
"subnetwork_uri": "some-subnet-name"
},
"software_config" : {
"image_version": "1.5"
}
},
"cluster_name": "some-name"
}
},
"jobs": [
{
"pyspark_job": {
"args": [
"job_argument"
],
"main_python_file_uri": "gs:///path-to-file"
},
"step_id": "extract"
}
],
"parameters": [
{
"name": "CLUSTER_NAME",
"fields": [
"placement.managedCluster.clusterName"
]
},
{
"name": "SUBNETWORK_URI",
"fields": [
"placement.managedCluster.config.gceClusterConfig.subnetworkUri"
]
},
{
"name": "MAIN_PY_FILE",
"fields": [
"jobs['extract'].pysparkJob.mainPythonFileUri"
]
},
{
"name": "JOB_ARGUMENT",
"fields": [
"jobs['extract'].pysparkJob.args[0]"
]
}
]
}
我使用的代码片段:
options = ClientOptions(api_endpoint="{}-dataproc.googleapis.com:443".format(region))
client = dataproc.WorkflowTemplateServiceClient(client_options=options)
template_file = open(path_to_file, "r")
template_dict = eval(template_file.read())
print(template_dict)
template = dataproc.WorkflowTemplate(template_dict)
full_region_id = "projects/{project_id}/regions/{region}".format(project_id=project_id, region=region)
try:
client.create_workflow_template(
parent=full_region_id,
template=template
)
except AlreadyExists as err:
print(err)
pass
当我尝试 运行 此代码时,出现以下错误:
google.api_core.exceptions.InvalidArgument: 400 Invalid field path placement.managed_cluster.configuration.gce_cluster_config.subnetwork_uri: Field gce_cluster_config does not exist.
如果我尝试参数化 placement.managedCluster.config.softwareConfig.imageVersion
,此行为也是相同的,我将得到
google.api_core.exceptions.InvalidArgument: 400 Invalid field path placement.managed_cluster.configuration.software_config.image_version: Field software_config does not exist.
但是如果我从 parameters
映射中排除 placement.managedCluster.config
下的任何字段,则模板创建成功。
我没有发现对这些字段进行参数化有任何限制。有没有?还是我做错了什么?
这 doc 列出了可参数化的字段。似乎只有 managedCluster
的 managedCluster.name
是可参数化的:
Managed cluster name. Dataproc will use the user-supplied name as the name prefix, and append random characters to create a unique cluster name. The cluster is deleted at the end of the workflow.
我没有看到 managedCluster.config
可参数化。