"ImportError: No module named pyRserve" when running AWS Glue job

"ImportError: No module named pyRserve" when running AWS Glue job

当我 运行 我尝试导入 pyRserve python 模块(纯 Python)的 Glue 作业时,出现此错误:

LogType:stdout
Log Upload Time:Sun Jan 21 12:27:32 +0000 2018
LogLength:206
Log Contents:
Traceback (most recent call last):
File "script_2018-01-21-12-27-05.py", line 8, in <module>
import pyRserve
ImportError: No module named pyRserve
End of LogType:stdout

以下是我工作的详细信息:

$ aws glue get-job --job-name test_trunc
{
    "Job": {
        "Name": "test_trunc",
        "Role": "arn:aws:iam::#CLIPPED#:role/AWSGlueServiceRoleDefault",
        "CreatedOn": 1516192543.117,
        "LastModifiedOn": 1516537317.889,
        "ExecutionProperty": {
            "MaxConcurrentRuns": 1
        },
        "Command": {
            "Name": "glueetl",
            "ScriptLocation": "s3://#CLIPPED#/gluescripts/test_trunc"
        },
        "DefaultArguments": {
            "--TempDir": "s3://#CLIPPED#/jobs/test_trunc/scripts",
            "--extra-py-files": "s3://#CLIPPED#/jobs/test_trunc/python-libs/pyRserve.zip",
            "--job-bookmark-option": "job-bookmark-disable",
            "--job-language": "python"
        },
        "Connections": {
            "Connections": [
                "redshift"
            ]
        },
        "MaxRetries": 0,
        "AllocatedCapacity": 10
    }
}

这是我运行宁的脚本:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import pprint
import pyRserve

这是完整的日志:

https://gist.github.com/mattazend/b611d0232d94ade4bc4c16bcb79f73a8

您是否按照建议在 S3 中为您尝试导入的库使用了 zip 文件here

According to AWS Glue Documentation:

Only pure Python libraries can be used. Libraries that rely on C extensions, such as the pandas Python Data Analysis Library, are not yet supported.

如果您使用的库依赖于 C 扩展,我认为即使我们将 python 库作为 zip 文件上传也不会起作用。我曾尝试使用 Pandas、假期等与您尝试过的方式相同的方式,并且在联系 AWS 支持时,他们提到它在他们的待办事项列表中(支持这些 python 库),但没有预计到达时间截至目前。

@Ishwr: 根据这个 link https://pypi.python.org/pypi/pyRserve/0.9.1,pyRserve 需要 Numpy 包作为 pre-requisite 安装,如果 numpy 是在已安装的库列表中找不到,pyRserve 在继续 pyRserve 之前首先安装 numpy。

由于 Glue 不支持 numpy,我相信由于这个原因,pyRserve 要么没有从 zip 文件安装,要么它不是 recognized/considered 用于作业,因此是 ImportError。

pyRserve has been written by Ralph Heinkel (www.ralph-heinkel.com) and is released under MIT license.

Quick Installation Make sure that Numpy is installed (version 1.4.x or higher).

Then from your unix/windows command line run:

pip pyRserve For manual installation download the tar.gz or zip package. After unpacking, cd into the pyRserve directory and run python setup.py install from the command line.

Actually pip pyRserve should install numpy if it is missing.

希望我是对的。

谢谢。