AWS SageMaker SparkML Schema Eroor:member.environment' 未能满足约束

AWS SageMaker SparkML Schema Eroor: member.environment' failed to satisfy constraint

我正在通过 Sagemaker 将模型部署到 AWS:

我将 JSON 架构设置如下:

import json
schema = {
    "input": [
        {
            "name": "V1",
            "type": "double"
        }, 
        {
            "name": "V2",
            "type": "double"
        }, 
        {
            "name": "V3",
            "type": "double"
        }, 
        {
            "name": "V4",
            "type": "double"
        }, 
        {
            "name": "V5",
            "type": "double"
        }, 
        {
            "name": "V6",
            "type": "double"
        },
        {
            "name": "V7",
            "type": "double"
        }, 
        {
            "name": "V8",
            "type": "double"
        }, 
        {
            "name": "V9",
            "type": "double"
        }, 
        {
            "name": "V10",
            "type": "double"
        }, 
        {
            "name": "V11",
            "type": "double"
        }, 
        {
            "name": "V12",
            "type": "double"
        }, 
        {
            "name": "V13",
            "type": "double"
        }, 
        {
            "name": "V14",
            "type": "double"
        },
        {
            "name": "V15",
            "type": "double"
        }, 
        {
            "name": "V16",
            "type": "double"
        }, 
        {
            "name": "V17",
            "type": "double"
        }, 
        {
            "name": "V18",
            "type": "double"
        }, 
        {
            "name": "V19",
            "type": "double"
        }, 
                {
            "name": "V20",
            "type": "double"
        }, 
        {
            "name": "V21",
            "type": "double"
        }, 
        {
            "name": "V22",
            "type": "double"
        },
        {
            "name": "V23",
            "type": "double"
        }, 
        {
            "name": "V24",
            "type": "double"
        }, 
        {
            "name": "V25",
            "type": "double"
        }, 
        {
            "name": "V26",
            "type": "double"
        }, 
        {
            "name": "V27",
            "type": "double"
        },
        {
            "name": "V28",
            "type": "double"
        },
        {
            "name": "Amount",
            "type": "double"
        },         
    ],
    "output": 
        {
            "name": "features",
            "type": "double",
            "struct": "vector"
        }
}
schema_json = json.dumps(schema)
print(schema_json)

并部署为:

from sagemaker.model import Model
from sagemaker.pipeline import PipelineModel
from sagemaker.sparkml.model import SparkMLModel

sparkml_data = 's3://{}/{}/{}'.format(s3_model_bucket, s3_model_key_prefix, 'model.tar.gz')
# passing the schema defined above by using an environment variable that sagemaker-sparkml-serving understands
sparkml_model = SparkMLModel(model_data=sparkml_data, env={'SAGEMAKER_SPARKML_SCHEMA' : schema_json})
xgb_model = Model(model_data=xgb_model.model_data, image=training_image)

model_name = 'inference-pipeline-' + timestamp_prefix
sm_model = PipelineModel(name=model_name, role=role, models=[sparkml_model, xgb_model])

    endpoint_name = 'inference-pipeline-ep-' + timestamp_prefix
sm_model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge', endpoint_name=endpoint_name)

我收到如下错误:

ClientError:调用 CreateModel 操作时发生错误(ValidationException):检测到 1 个验证错误:值'{SAGEMAKER_SPARKML_SCHEMA={"input":[{"type":"double", "name": "V1"}, {"type": "double", "name": "V2"}, {"type": "double", "name": "V3"}, {"type": "double", "name": "V4"}, {"type": "double", "name": "V5"}, {"type": "double", "name": "V6"}, {"type": "double", "name": "V7"}, {"type": "double", "name": "V8"}, { "type": "double", "name": "V9"}, {"type": "double", "name": "V10"} , {"type": "double", "name": "V11"}, {"type": "double", "name": "V12"}, {"type": "double", "name": "V13"}, {"type": "double", "name": "V14"}, {"type": "double", "name": "V15"}, {"type": "double", "name": "V16"}, {"type": "double", "name": "V17"}, {"type": "double", "name": "V18"}, {"type": "double", "name": "V19"}, {"type": "double", "name": "V20"}, {"type": "double", "name": "V21"}, {"type": "double", "name": "V22"}, {"type": "double", "name": "V23"}, {"type": "double", "name": "V24"}, {"type": "double", "name": "V25"}, {"type": "double", "name": "V26"}, {"type": "double", "name": "V27"}, {"type": "double", "name": "V28"}, {"type": "double", "name": "Amount"}], "output": {"type": "double", "name": "features", "struct": "vector"}}}' at 'containers.1**.member.environment' 未能满足约束:映射值必须满足约束:[成员的长度必须小于或等于 1024,**成员的长度必须大于或等于0,成员必须满足正则表达式模式:[\S\s]*]

我尝试将我的功能减少到 20 个并且它能够部署。只是想知道如何传递具有 29 个属性的架构?

我认为1024限制的环境长度不会在短时间内增加。要解决此问题,您可以尝试使用 SAGEMAKER_SPARKML_SCHEMA env var:

重建 spark ml 容器

https://github.com/aws/sagemaker-sparkml-serving-container/blob/master/README.md#running-the-image-locally