AWS Kinesis 问题 SQL - 随机森林砍伐算法
Issue with AWS Kinesis SQL - Random Cut Forest algorithm
我在 AWS Kinesis 应用程序中有此代码:
CREATE OR REPLACE STREAM "OUT_FILE" (
"fechaTS" timestamp,
"celda" varchar(25),
"Field1" DOUBLE,
"Field2" DOUBLE,
"ANOMALY_SCORE" DOUBLE,
"ANOMALY_EXPLANATION" varchar(1024)
);
CREATE OR REPLACE PUMP "PMP_OUT" AS
INSERT INTO "OUT_FILE"
SELECT STREAM
"fechaTS",
"celda",
"Field1",
"Field2",
"ANOMALY_SCORE",
"ANOMALY_EXPLANATION"
FROM TABLE(RANDOM_CUT_FOREST_WITH_EXPLANATION(
CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"), 300, 512, 8064, 4, true))
WHERE "celda" = 'CELLNUMBER'
;
我只期望每条输入记录的异常分数计算的通常输出。
相反,我收到此错误消息:
Number of numeric attributes should be less than or equal to 30 (Please check the documentation to know the supported numeric SQL types)
我输入模型的数字属性的数量只有 2。另一方面,根据文档,支持的 SQL 数字类型是这些:DOUBLE、INTEGER、FLOAT、TINYINT、SMALLINT , REAL 和 BIGINT。 (我也尝试过使用 FLOAT)。
我做错了什么?
解决方案是在输入模式级别将变量定义为 DOUBLE(或其他可接受的类型):在 SQL 中将它们定义为 DOUBLE 是不够的。
我尝试了这样的 JSON 并且成功了:
{"ApplicationName": "<myAppName>",
"Inputs": [{
"InputSchema": {
"RecordColumns": [{"Mapping": "fechaTS", "Name": "fechaTS", "SqlType": "timestamp"},
{"Mapping": "celda","Name": "celda","SqlType": "varchar(25)"},
{"Mapping": "Field1","Name": "Field1","SqlType": "DOUBLE"},
{"Mapping": "Field2","Name": "Field2","SqlType": "DOUBLE"},
{"Mapping": "Field3","Name": "Field3","SqlType": "DOUBLE"}],
"RecordFormat": {"MappingParameters": {"JSONMappingParameters": {"RecordRowPath": "$"}},
"RecordFormatType": "JSON"}
},
"KinesisStreamsInput": {"ResourceARN": "<myInputARN>", "RoleARN": "<myRoleARN>"},
"NamePrefix": "<myNamePrefix>"
}]
}
附加信息:如果将此 JSON 保存在 myJson.json 中,则发出此命令:
aws kinesisanalytics create-application --cli-input-json file://myJson.json
必须预先安装和配置 AWS 命令行界面 (CLI)。
我在 AWS Kinesis 应用程序中有此代码:
CREATE OR REPLACE STREAM "OUT_FILE" (
"fechaTS" timestamp,
"celda" varchar(25),
"Field1" DOUBLE,
"Field2" DOUBLE,
"ANOMALY_SCORE" DOUBLE,
"ANOMALY_EXPLANATION" varchar(1024)
);
CREATE OR REPLACE PUMP "PMP_OUT" AS
INSERT INTO "OUT_FILE"
SELECT STREAM
"fechaTS",
"celda",
"Field1",
"Field2",
"ANOMALY_SCORE",
"ANOMALY_EXPLANATION"
FROM TABLE(RANDOM_CUT_FOREST_WITH_EXPLANATION(
CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"), 300, 512, 8064, 4, true))
WHERE "celda" = 'CELLNUMBER'
;
我只期望每条输入记录的异常分数计算的通常输出。
相反,我收到此错误消息:
Number of numeric attributes should be less than or equal to 30 (Please check the documentation to know the supported numeric SQL types)
我输入模型的数字属性的数量只有 2。另一方面,根据文档,支持的 SQL 数字类型是这些:DOUBLE、INTEGER、FLOAT、TINYINT、SMALLINT , REAL 和 BIGINT。 (我也尝试过使用 FLOAT)。
我做错了什么?
解决方案是在输入模式级别将变量定义为 DOUBLE(或其他可接受的类型):在 SQL 中将它们定义为 DOUBLE 是不够的。
我尝试了这样的 JSON 并且成功了:
{"ApplicationName": "<myAppName>",
"Inputs": [{
"InputSchema": {
"RecordColumns": [{"Mapping": "fechaTS", "Name": "fechaTS", "SqlType": "timestamp"},
{"Mapping": "celda","Name": "celda","SqlType": "varchar(25)"},
{"Mapping": "Field1","Name": "Field1","SqlType": "DOUBLE"},
{"Mapping": "Field2","Name": "Field2","SqlType": "DOUBLE"},
{"Mapping": "Field3","Name": "Field3","SqlType": "DOUBLE"}],
"RecordFormat": {"MappingParameters": {"JSONMappingParameters": {"RecordRowPath": "$"}},
"RecordFormatType": "JSON"}
},
"KinesisStreamsInput": {"ResourceARN": "<myInputARN>", "RoleARN": "<myRoleARN>"},
"NamePrefix": "<myNamePrefix>"
}]
}
附加信息:如果将此 JSON 保存在 myJson.json 中,则发出此命令:
aws kinesisanalytics create-application --cli-input-json file://myJson.json
必须预先安装和配置 AWS 命令行界面 (CLI)。