Random Cut Forest 模型评估策略(混淆矩阵,准确度和 precision_recall_fscore
Random Cut Forest model evaluation strategies (Confusion matrix,Accuracy and precision_recall_fscore
我正在使用 AWS sagemker 随机砍伐森林算法来检测异常。
import boto3
import sagemaker
containers = {
'us-west-2': '174872318107.dkr.ecr.us-west-2.amazonaws.com/randomcutforest:latest',
'us-east-1': '382416733822.dkr.ecr.us-east-1.amazonaws.com/randomcutforest:latest',
'us-east-2': '404615174143.dkr.ecr.us-east-2.amazonaws.com/randomcutforest:latest',
'eu-west-1': '438346466558.dkr.ecr.eu-west-1.amazonaws.com/randomcutforest:latest',
'ap-southeast-1':'475088953585.dkr.ecr.ap-southeast-1.amazonaws.com/randomcutforest:latest'
}
region_name = boto3.Session().region_name
container = containers[region_name]
session = sagemaker.Session()
rcf = sagemaker.estimator.Estimator(
container,
sagemaker.get_execution_role(),
output_path='s3://{}/{}/output'.format(bucket, prefix),
train_instance_count=1,
train_instance_type='ml.c5.xlarge',
sagemaker_session=session)
rcf.set_hyperparameters(
num_samples_per_tree=200,
num_trees=250,
feature_dim=1,
eval_metrics =["accuracy", "precision_recall_fscore"])
s3_train_input = sagemaker.session.s3_input(
s3_train_data,
distribution='ShardedByS3Key',
content_type='application/x-recordio-protobuf')
rcf.fit({'train': s3_train_input})
用上面的代码训练模型,没找到评估模型的方法。
如何在部署模型后获得准确度和 F 分数。
为了获得评估指标,您需要在训练期间提供一个名为“测试”的额外通道。测试通道必须包含标记数据。官方文档中有说明,https://docs.aws.amazon.com/sagemaker/latest/dg/randomcutforest.html :
Amazon SageMaker Random Cut Forest supports the train and test data channels. The optional test channel is used to compute accuracy, precision, recall, and F1-score metrics on labeled data. Train and test data content types can be either application/x-recordio-protobuf or text/csv formats. For the test data, when using text/csv format, the content must be specified as text/csv;label_size=1 where the first column of each row represents the anomaly label: "1" for an anomalous data point and "0" for a normal data point. You can use either File mode or Pipe mode to train RCF models on data that is formatted as recordIO-wrapped-protobuf or as CSV
Also note ... the test channel only supports S3DataDistributionType=FullyReplicated
谢谢,
胡里奥
我正在使用 AWS sagemker 随机砍伐森林算法来检测异常。
import boto3
import sagemaker
containers = {
'us-west-2': '174872318107.dkr.ecr.us-west-2.amazonaws.com/randomcutforest:latest',
'us-east-1': '382416733822.dkr.ecr.us-east-1.amazonaws.com/randomcutforest:latest',
'us-east-2': '404615174143.dkr.ecr.us-east-2.amazonaws.com/randomcutforest:latest',
'eu-west-1': '438346466558.dkr.ecr.eu-west-1.amazonaws.com/randomcutforest:latest',
'ap-southeast-1':'475088953585.dkr.ecr.ap-southeast-1.amazonaws.com/randomcutforest:latest'
}
region_name = boto3.Session().region_name
container = containers[region_name]
session = sagemaker.Session()
rcf = sagemaker.estimator.Estimator(
container,
sagemaker.get_execution_role(),
output_path='s3://{}/{}/output'.format(bucket, prefix),
train_instance_count=1,
train_instance_type='ml.c5.xlarge',
sagemaker_session=session)
rcf.set_hyperparameters(
num_samples_per_tree=200,
num_trees=250,
feature_dim=1,
eval_metrics =["accuracy", "precision_recall_fscore"])
s3_train_input = sagemaker.session.s3_input(
s3_train_data,
distribution='ShardedByS3Key',
content_type='application/x-recordio-protobuf')
rcf.fit({'train': s3_train_input})
用上面的代码训练模型,没找到评估模型的方法。 如何在部署模型后获得准确度和 F 分数。
为了获得评估指标,您需要在训练期间提供一个名为“测试”的额外通道。测试通道必须包含标记数据。官方文档中有说明,https://docs.aws.amazon.com/sagemaker/latest/dg/randomcutforest.html :
Amazon SageMaker Random Cut Forest supports the train and test data channels. The optional test channel is used to compute accuracy, precision, recall, and F1-score metrics on labeled data. Train and test data content types can be either application/x-recordio-protobuf or text/csv formats. For the test data, when using text/csv format, the content must be specified as text/csv;label_size=1 where the first column of each row represents the anomaly label: "1" for an anomalous data point and "0" for a normal data point. You can use either File mode or Pipe mode to train RCF models on data that is formatted as recordIO-wrapped-protobuf or as CSV
Also note ... the test channel only supports S3DataDistributionType=FullyReplicated
谢谢,
胡里奥