sagemaker Processing 在 S3 上保存结果是否有任何限制?
Is there any limits of saving result on S3 from sagemaker Processing?
※ 我用的是google翻译,有什么问题可以私信我哦!
我正在尝试使用 sagemaker 处理 运行 python 具有大量 4 数据的脚本。而我目前的情况如下:
- 可以运行这个脚本有3个数据
- 不能运行只有1条数据的脚本(最大的,和其他结构一样)
- 至于所有 4 个数据,脚本已完成(因此,我怀疑 S3 中存在此错误,即在将 sagemaker 结果复制到 S3 时)
我得到的错误是这个 InternalServerError。
Traceback (most recent call last):
File "sagemaker_train_and_predict.py", line 56, in <module>
outputs=outputs
File "{xxx}/sagemaker_constructor.py", line 39, in run
outputs=outputs
File "{masked}/.pyenv/versions/3.6.8/lib/python3.6/site-packages/sagemaker/processing.py", line 408, in run
self.latest_job.wait(logs=logs)
File "{masked}/.pyenv/versions/3.6.8/lib/python3.6/site-packages/sagemaker/processing.py", line 723, in wait
self.sagemaker_session.logs_for_processing_job(self.job_name, wait=True)
File "{masked}/.pyenv/versions/3.6.8/lib/python3.6/site-packages/sagemaker/session.py", line 3111, in logs_for_processing_job
self._check_job_status(job_name, description, "ProcessingJobStatus")
File "{masked}/.pyenv/versions/3.6.8/lib/python3.6/site-packages/sagemaker/session.py", line 2615, in _check_job_status
actual_status=status,
sagemaker.exceptions.UnexpectedStatusException: Error for Processing job sagemaker-vm-train-and-predict-2020-04-12-04-15-40-655: Failed. Reason: InternalServerError: We encountered an internal error. Please try again.
如果输出以高速率生成且大小太大,则将输出数据传输到 S3 时可能会出现一些问题。
您可以 1) 尝试稍微放慢写入输出的速度或 2) 从您的算法容器调用 S3 以使用 boto 客户端 (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html) 直接上传输出。
※ 我用的是google翻译,有什么问题可以私信我哦!
我正在尝试使用 sagemaker 处理 运行 python 具有大量 4 数据的脚本。而我目前的情况如下:
- 可以运行这个脚本有3个数据
- 不能运行只有1条数据的脚本(最大的,和其他结构一样)
- 至于所有 4 个数据,脚本已完成(因此,我怀疑 S3 中存在此错误,即在将 sagemaker 结果复制到 S3 时)
我得到的错误是这个 InternalServerError。
Traceback (most recent call last):
File "sagemaker_train_and_predict.py", line 56, in <module>
outputs=outputs
File "{xxx}/sagemaker_constructor.py", line 39, in run
outputs=outputs
File "{masked}/.pyenv/versions/3.6.8/lib/python3.6/site-packages/sagemaker/processing.py", line 408, in run
self.latest_job.wait(logs=logs)
File "{masked}/.pyenv/versions/3.6.8/lib/python3.6/site-packages/sagemaker/processing.py", line 723, in wait
self.sagemaker_session.logs_for_processing_job(self.job_name, wait=True)
File "{masked}/.pyenv/versions/3.6.8/lib/python3.6/site-packages/sagemaker/session.py", line 3111, in logs_for_processing_job
self._check_job_status(job_name, description, "ProcessingJobStatus")
File "{masked}/.pyenv/versions/3.6.8/lib/python3.6/site-packages/sagemaker/session.py", line 2615, in _check_job_status
actual_status=status,
sagemaker.exceptions.UnexpectedStatusException: Error for Processing job sagemaker-vm-train-and-predict-2020-04-12-04-15-40-655: Failed. Reason: InternalServerError: We encountered an internal error. Please try again.
如果输出以高速率生成且大小太大,则将输出数据传输到 S3 时可能会出现一些问题。
您可以 1) 尝试稍微放慢写入输出的速度或 2) 从您的算法容器调用 S3 以使用 boto 客户端 (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html) 直接上传输出。