使用 .record_set() 转换为 amazon protobuf 记录需要多长时间才能完成

how long does converting to amazon protobuf record using .record_set() take to complete

我正在尝试使用 sagemaker.amazon.amazon_estimator.AmazonAlgorithmEstimatorBase.record_set() 将 numpy 数组转换为 amazon protobuf 记录,但是,这需要很长时间。

我想知道该功能实际如何执行以及需要多长时间

from sagemaker import LinearLearner
import numpy as np

model=LinearLearner(role=get_execution_role(),
                             train_instance_count=len(train_features),
                             train_instance_type='ml.t2.medium',
                             predictor_type='binary_classifier',
                                )
numpy_array = np.array([[7.4727994e-01 9.5506465e-01 7.6940370e-01 8.2015032e-01 1.8113719e-01
  7.8720862e-01 2.9677063e-01 2.6711187e-01 7.9498607e-01 4.4924998e-01
  4.9533784e-01 2.6846960e-01 7.0506859e-01 4.1573554e-01 6.5843487e-01
  3.2448095e-01 4.3870610e-01 7.2739214e-01 6.0914969e-01 5.5108833e-01
  5.8835250e-01 5.5872935e-01 4.4392920e-01 6.8353373e-01 4.7664520e-01
  5.6887656e-01 4.7034043e-01 4.1631639e-01 3.1357434e-01 5.5933639e-04]
 [5.7815754e-01 9.5828843e-01 7.7824914e-01 8.3188844e-01 2.3287645e-01
  7.7196079e-01 2.5512937e-01 2.7032304e-01 7.8349811e-01 5.0130588e-01
  4.8345023e-01 3.8397798e-01 5.9922373e-01 4.7720599e-01 6.7832541e-01
  2.7788603e-01 4.6435007e-01 7.6100332e-01 7.7771670e-01 5.1536995e-01
  5.8536130e-01 5.6407303e-01 5.0898582e-01 6.7815554e-01 3.0614817e-01
  5.7353836e-01 3.8981739e-01 4.1474316e-01 3.1389123e-01 3.5031504e-04]]) 
record=model.record_set(numpy_array)

预期输出

我希望变量记录包含一条准备好使用线性学习模型进行训练的记录

我认为这是问题所在:

train_instance_count=len(train_features)

此参数与基础设施有关(您要训练多少 SageMaker 实例),与功能无关。您应该将其设置为 1。

import sagemaker
from sagemaker import LinearLearner
import numpy as np

model=LinearLearner(role=sagemaker.get_execution_role(),
                             train_instance_count=1,
                             train_instance_type='ml.t2.medium',
                             predictor_type='binary_classifier')

numpy_array = np.array(...)

record=model.record_set(numpy_array)
# This takes <100 ms on my t3 notebook instance

print(record)

(<class 'sagemaker.amazon.amazon_estimator.RecordSet'>, {'s3_data':
's3://sagemaker-eu-west-1-123456789012/sagemaker-record-sets/LinearLearner-
2019-07-18-09-48-21-639/.amazon.manifest', 'feature_dim': 30, 'num_records': 2,
's3_data_type': 'ManifestFile', 'channel': 'train'})

清单文件列出 protobuf-encoded 个文件:

[{"prefix": "s3://sagemaker-eu-west-1-123456789012/sagemaker-record-sets/LinearLearner-2019-07-18-09-48-21-639/"}, "matrix_0.pbr"]

您现在可以在调用 fit() 时将其用于培训频道,回复:https://docs.aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.html