我可以控制 AWS Glue 服务中的 DPU 数量吗?

Can I control the number of DPU in the AWS Glue Service?

我可以控制 AWS Glue 服务中的 DPU 数量吗?

我在官方文档中看到 Glue 有六个 DPU,但我不需要最多六个 DPU。 另外,恐怕成本过高

您可以指定工人的数量和类型。引自 documentation:

Worker type

The following worker types are available:

  • Standard – When you choose this type, you also provide a value for Maximum capacity. Maximum capacity is the number of AWS Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. The Standard worker type has a 50 GB disk and 2 executors.

  • G.1X – When you choose this type, you also provide a value for Number of workers. Each worker maps to 1 DPU (4 vCPU, 16 GB of memory, 64 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs.

  • G.2X – When you choose this type, you also provide a value for Number of workers. Each worker maps to 2 DPU (8 vCPU, 32 GB of memory, 128 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs and jobs that run ML transforms.

    You are charged an hourly rate based on the number of DPUs used to run your ETL jobs. For more information, see the AWS Glue pricing page.

    When you configure a job using the console and specify a Worker type of Standard, the Maximum capacity is set and the Number of workers becomes the value of Maximum capacity - 1. If you use the AWS Command Line Interface (AWS CLI) or AWS SDK, you can specify the Max capacity parameter, or you can specify both Worker type and the Number of workers. For more information, see Jobs.

Number of workers

The number of workers of a defined workerType that are allocated when a job runs. With G.1X and G.2X Worker types, you must specify the number of workers of that type. The maximum number of workers you can define are 299 for G.1X, and 149 for G.2X.

您的 Glue 作业到 运行 所需的最小 DPU 是两个。您并不总是需要六个 DPU 来执行 Glue 作业。

要正确规划您的容量,您可以参考 this