将线性学习器的输入类型更改为 csv

Changing input type for linear learner to csv

我正在尝试 运行 简单数据集上的线性学习器。我的数据 csv 已上传到存储桶。问题是,当我 运行 它时,我收到以下错误:

UnexpectedStatusException: Error for Training job linear-learner-2020-05-23-22-31-40-894: Failed. Reason: ClientError: Unable to read data channel 'train'. Requested content-type is 'application/x-recordio-protobuf'. Please verify the data matches the requested content-type. (caused by MXNetError)

Caused by: [22:34:37] /opt/brazil-pkg-cache/packages/AIAlgorithmsCppLibs/AIAlgorithmsCppLibs-2.0.2746.0/AL2012/generic-flavor/src/src/aialgs/io/iterator_base.cpp:100: (Input Error) The header of the MXNet RecordIO record at position 0 in the dataset does not start with a valid magic number.

我用谷歌搜索了一下,它说要将 content_type 更改为 'text/csv'。我的问题是,我该怎么做?或者有人知道如何让它工作吗?谢谢!这是我的线性学习器代码:

container = get_image_uri(boto3.Session().region_name, 'linear-learner')

linear = sagemaker.estimator.Estimator(container,
                                      role,
                                      train_instance_count = 1,
                                      train_instance_type = 'ml.c4.xlarge',
                                      output_path = output_location,
                                      sagemaker_session = sess)

linear.set_hyperparameters(predictor_type = 'regressor',
                          mini_batch_size = 200)

您可以使用 SageMaker 输入通道:


train_data = sagemaker.inputs.TrainingInput(
    "s3://my-bucket/path/to/train",
    distribution="FullyReplicated",
    content_type="text/csv",
    s3_data_type="S3Prefix",
    record_wrapping=None,
    compression=None
)

validation_data = sagemaker.inputs.TrainingInput(
    "s3://my-bucket/path/to/validation",
    distribution="FullyReplicated",
    content_type="text/csv",
    s3_data_type="S3Prefix",
    record_wrapping=None,
    compression=None
)

linear.fit({"train": train_data, "validation": validation_data})
</pre>

See this example