本地作业找不到区域

Local Job unable to find Region

我正在尝试从 docker 容器在本地 运行 AWS GLUE 作业,但出现以下错误:

    File "/glue/script.py", line 19, in <module>
        job.init(args['JOB_NAME'], args)
      File "/glue/aws-glue-libs/PyGlue.zip/awsglue/job.py", line 38, in init
      File "/glue/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
      File "/glue/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
      File "/glue/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling z:com.amazonaws.services.glue.util.Job.init.
    : com.amazonaws.SdkClientException: Unable to load region information from any provider in the chain

它似乎无法找到该区域,但我已将我的配置和凭据文件存储在容器内的常用路径中,因此它应该能够从那里找到它。或者我应该尝试从脚本文件中声明区域?

这是作业的前几行,目前在最后一行失败:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.sql.functions import *
from awsglue.dynamicframe import DynamicFrame## @type: DataSource
import datetime
import boto3

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

我已经尝试 运行 从 Docker 容器本地进行胶水作业,它对我来说效果很好。

我写了一篇关于相同内容的博客,docker 图片也可以在 dockerhub 上找到。不太确定这个错误,但如果你想使用我提供的图像 link 相同

文章:https://towardsdatascience.com/develop-glue-jobs-locally-using-docker-containers-bffc9d95bd1

Github: https://github.com/jnshubham/aws-glue-local-etl-docker

我使用这个没有遇到区域问题,看看这是否对你有帮助。