AWS SageMaker PyTorch：没有名为 'sagemaker' 的模块

Question

我已经使用 SageMaker 在 AWS 上部署了一个 PyTorch 模型，我尝试发送一个请求来测试该服务。但是，我收到一条非常模糊的错误消息，说 "no module named 'sagemaker'"。我尝试在线搜索，但找不到类似消息的帖子。

我的客户代码：

import numpy as np
from sagemaker.pytorch.model import PyTorchPredictor

ENDPOINT = '<endpoint name>'

predictor = PyTorchPredictor(ENDPOINT)
predictor.predict(np.random.random_sample([1, 3, 224, 224]).tobytes())

详细错误信息：

Traceback (most recent call last):
  File "client.py", line 7, in <module>
    predictor.predict(np.random.random_sample([1, 3, 224, 224]).tobytes())
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/sagemaker/predictor.py", line 110, in predict
    response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/botocore/client.py", line 276, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/botocore/client.py", line 586, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "No module named 'sagemaker'". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/<endpoint name> in account xxxxxxxxxxxxxx for more information.

这个错误是因为我将服务脚本和我的部署脚本合并在一起，见下文

import os
import torch
import numpy as np
from sagemaker.pytorch.model import PyTorchModel
from torch import cuda
from torchvision.models import resnet50


def model_fn(model_dir):
    device = torch.device('cuda' if cuda.is_available() else 'cpu')
    model = resnet50()
    with open(os.path.join(model_dir, 'model.pth'), 'rb') as f:
        model.load_state_dict(torch.load(f, map_location=device))
    return model.to(device)

def predict_fn(input_data, model):
    device = torch.device('cuda' if cuda.is_available() else 'cpu')
    model.eval()
    with torch.no_grad():
        return model(input_data.to(device))


if __name__ == '__main__':
    pytorch_model = PyTorchModel(model_data='s3://<bucket name>/resnet50/model.tar.gz',
                                    entry_point='serve.py', role='jiashenC-sagemaker',
                                    py_version='py3', framework_version='1.3.1')
    predictor = pytorch_model.deploy(instance_type='ml.t2.medium', initial_instance_count=1)
    print(predictor.predict(np.random.random_sample([1, 3, 224, 224]).astype(np.float32)))

根本原因是我的代码中的第 4 行。它尝试导入 sagemaker，这是一个不可用的库。

Answer 1

（编辑 2/9/2020，添加额外的代码片段）

您的服务代码试图在内部使用 sagemaker 模块。 sagemaker 模块（也称为 SageMaker Python SDK, one of the numerous orchestration SDKs for SageMaker) is not designed to be used in model containers, but instead out of models, to orchestrate their activity (train, deploy, bayesian tuning, etc). In your specific example, you shouldn't include the deployment and model call code to server code, as those are actually actions that will be conducted from outside the server to orchestrate its lifecyle and interact with it. For model deployment with the Sagemaker Pytorch container, your entry point script just needs to contain the required model_fn function for model deserialization, and optionally an input_fn, predict_fn and output_fn, respectively for pre-processing, inference and post-processing (detailed in the documentation here）。这个逻辑很漂亮 :) ：您不需要任何其他东西来部署生产就绪的深度学习服务器！（Pytorch和MXNet情况下是MMS，sklearn情况下是Flask+Gunicorn）。

总而言之，您的代码应该这样拆分：

包含模型服务代码的 entry_point 脚本 serve.py，如下所示：

import os

import numpy as np
import torch
from torch import cuda
from torchvision.models import resnet50

def model_fn(model_dir):
    # TODO instantiate a model from its artifact stored in model_dir
    return model

def predict_fn(input_data, model):
    # TODO apply model to the input_data, return result of interest
    return result

和一些编排代码来实例化 SageMaker 模型对象，将其部署到服务器并进行查询。这是运行来自您选择的编排运行时间，它可以是 SageMaker Notebook、您的笔记本电脑、AWS Lambda 函数、Apache Airflow 运算符等 - 以及供您选择的 SDK；不需要为此使用 python。

import numpy as np
from sagemaker.pytorch.model import PyTorchModel

pytorch_model = PyTorchModel(
    model_data='s3://<bucket name>/resnet50/model.tar.gz',
    entry_point='serve.py',
    role='jiashenC-sagemaker',
    py_version='py3',
    framework_version='1.3.1')

predictor = pytorch_model.deploy(instance_type='ml.t2.medium', initial_instance_count=1)

print(predictor.predict(np.random.random_sample([1, 3, 224, 224]).astype(np.float32)))

AWS SageMaker PyTorch：没有名为 'sagemaker' 的模块

AWS SageMaker PyTorch: no module named 'sagemaker'

python

amazon-web-services

amazon-sagemaker