从 AWS Glue 到 RDS 的 JDBC 连接超时

Connection timed out with JDBC connection from AWS Glue to RDS

我正在尝试直接从我的 AWS Glue 脚本连接到我的 PosgreSQL RDS。我已经尝试使用生成的代码进行连接并且它有效。但它不适用于 JDBC 类型的连接。这是代码:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
import pyspark.sql.functions as F
from pyspark.sql.functions import *

## Initialize
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

df = spark \
     .read \
     .format('jdbc') \
     .option('url', 'jdbc:postgresql://host/database_name') \
     .option('dbtable', "(SELECT * FROM table WHERE name = 'abcd') AS t") \
     .option('user', 'username') \
     .option('password', 'password') \
     .load()

job.commit()

部分错误:

An error occurred while calling o74.load. : java.sql.SQLException: [Amazon](500150) Error setting/closing connection: Connection timed out. at com.amazon.redshift.client.PGClient.connect ....

补充信息:

在此先致谢,如果您需要更多信息,请告诉我。

我刚刚找到原因。这是因为我没有指定端口。我不记得以前放过端口。之后一切正常。

df = spark \
     .read \
     .format('jdbc') \
     .option('url', 'jdbc:postgresql://host:5432/database_name') \
     .option('dbtable', "(SELECT * FROM table WHERE name = 'abcd') AS t") \
     .option('user', 'username') \
     .option('password', 'password') \
     .load()