如何将 AWS Glue 连接到 VPC 并访问私有资源?
How to connect AWS Glue to a VPC, and access private resources?
我正在尝试从 AWS Glue 作业连接到 VPC(私有子网)内的服务和数据库 运行。私有资源不应 public 公开(例如,移动到 public 子网或设置 public 负载平衡器)。
不幸的是,AWS Glue 似乎不支持 运行 用户定义的 VPC。 AWS 确实提供了一种叫做 Glue Database Connections 的东西,当与 Glue SDK 一起使用时,它可以在指定的 VPC 内为 Glue/Spark 工作节点神奇地设置弹性网络接口。然后,网络接口通过隧道将流量从 Glue 传输到 VPC 内的特定数据库。但是,这需要特定数据库的位置和凭据,并且不清楚是否以及何时通过 VPC 传输其他流量(例如,对服务的 REST 调用)。
是否有可靠的方法来设置 Glue -> VPC 连接,使所有流量都通过 VPC 隧道?
However, this requires the location and credentials of specific
databases, and it is not clear if and when other traffic (e.g., a REST
call to a service) is tunnelled through the VPC.
我同意文档令人困惑,但根据您链接页面上的这一段,似乎所有流量确实都通过 VPC 隧道传输,因为您必须有 NAT 网关或 VPC 端点才能允许 Glue使用 VPC 访问配置后访问 VPC 之外的内容:
All JDBC data stores that are accessed by the job must be available
from the VPC subnet. To access Amazon S3 from within your VPC, a VPC
endpoint is required. If your job needs to access both VPC resources
and the public internet, the VPC needs to have a Network Address
Translation (NAT) gateway inside the VPC.
您可以创建连接类型为 NETWORK
的数据库连接,并在 Glue 作业中使用该连接。它将允许您的作业调用 REST API 或您的 VPC 中的任何其他资源。
https://docs.aws.amazon.com/glue/latest/dg/connection-using.html
Network (designates a connection to a data source within an Amazon
Virtual Private Cloud environment (Amazon VPC))
https://docs.aws.amazon.com/glue/latest/dg/connection-JDBC-VPC.html
To allow AWS Glue to communicate with its components, specify a
security group with a self-referencing inbound rule for all TCP ports.
By creating a self-referencing rule, you can restrict the source to
the same security group in the VPC and not open it to all networks.
我正在尝试从 AWS Glue 作业连接到 VPC(私有子网)内的服务和数据库 运行。私有资源不应 public 公开(例如,移动到 public 子网或设置 public 负载平衡器)。
不幸的是,AWS Glue 似乎不支持 运行 用户定义的 VPC。 AWS 确实提供了一种叫做 Glue Database Connections 的东西,当与 Glue SDK 一起使用时,它可以在指定的 VPC 内为 Glue/Spark 工作节点神奇地设置弹性网络接口。然后,网络接口通过隧道将流量从 Glue 传输到 VPC 内的特定数据库。但是,这需要特定数据库的位置和凭据,并且不清楚是否以及何时通过 VPC 传输其他流量(例如,对服务的 REST 调用)。
是否有可靠的方法来设置 Glue -> VPC 连接,使所有流量都通过 VPC 隧道?
However, this requires the location and credentials of specific databases, and it is not clear if and when other traffic (e.g., a REST call to a service) is tunnelled through the VPC.
我同意文档令人困惑,但根据您链接页面上的这一段,似乎所有流量确实都通过 VPC 隧道传输,因为您必须有 NAT 网关或 VPC 端点才能允许 Glue使用 VPC 访问配置后访问 VPC 之外的内容:
All JDBC data stores that are accessed by the job must be available from the VPC subnet. To access Amazon S3 from within your VPC, a VPC endpoint is required. If your job needs to access both VPC resources and the public internet, the VPC needs to have a Network Address Translation (NAT) gateway inside the VPC.
您可以创建连接类型为 NETWORK
的数据库连接,并在 Glue 作业中使用该连接。它将允许您的作业调用 REST API 或您的 VPC 中的任何其他资源。
https://docs.aws.amazon.com/glue/latest/dg/connection-using.html
Network (designates a connection to a data source within an Amazon Virtual Private Cloud environment (Amazon VPC))
https://docs.aws.amazon.com/glue/latest/dg/connection-JDBC-VPC.html
To allow AWS Glue to communicate with its components, specify a security group with a self-referencing inbound rule for all TCP ports. By creating a self-referencing rule, you can restrict the source to the same security group in the VPC and not open it to all networks.