如何使用参数创建数据块作业

Question

我正在使用 databricks-cli 在 databricks 中创建一个新作业：

databricks jobs create --json-file ./deploy/databricks/config/job.config.json

与以下json:

{
    "name": "Job Name",
    "new_cluster": {
        "spark_version": "4.1.x-scala2.11",
        "node_type_id": "Standard_D3_v2",
        "num_workers": 3,
        "spark_env_vars": {
            "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
        }
    },
    "libraries": [
        {
            "maven": {
                "coordinates": "com.microsoft.sqlserver:mssql-jdbc:6.5.3.jre8-preview"
            }
        }
    ],
    "timeout_seconds": 3600,
    "max_retries": 3,
    "schedule": {
        "quartz_cron_expression": "0 0 22 ? * *",
        "timezone_id": "Israel"
    },
    "notebook_task": {
        "notebook_path": "/notebooks/python_notebook"
    }
}

我想添加可以在笔记本中访问的参数：

dbutils.widgets.text("argument1", "<default value>")
dbutils.widgets.get("argument1")

Answer 1

稍微调整后找到答案，您可以简单地扩展 notebook_task 属性以包含 base_parameters，如下所示：

{
    "notebook_task": {
        "notebook_path": "/social/04_batch_trends",
        "base_parameters": {           
            "argument1": "value 1",
            "argument2": "value 2"
        }
    }
}

这在 Create method of the Jobs API. It lists the notebook_task parameter, which can be of the type NotebookTask 中有记录。

如何使用参数创建数据块作业

How to create a databricks job with parameters

python

pyspark

databricks

azure-databricks

databricks-cli