Azure ML 实验批处理 Web 服务调用因无效输出扩展而失败
Azure ML Experiment Batch Webservice Call Fails with Invalid Output Extension
我有一个 Azure webjob,它通过 HttpRequests 调用 ML 训练实验,利用 ML webportal 中生成的代码:
var request = new BatchExecutionRequest()
{
Inputs = new Dictionary<string, AzureBlobDataReference>() {
{
"input1",
new AzureBlobDataReference()
{
ConnectionString = _connectionString,
RelativeLocation = $"{_containerName}/{experimentId}/{tenantId}/{trainingDataFileName}"
}
},
},
Outputs = new Dictionary<string, AzureBlobDataReference>() {
{
"output1",
new AzureBlobDataReference()
{
ConnectionString = "azureStorageConnectionString",
RelativeLocation = $"{_containerName}/{experimentId}/{tenantId}/Model_2018421.ilearner"
}
},
},
GlobalParameters = new Dictionary<string, string>()
{
}
};
但是,请求失败并显示以下消息:
The blob reference:
experiments/experimentId/TenantId/Model_2018421.ilearner
has an invalid or missing file extension. Supported file extensions
for this output type are: \".csv, .tsv, .arff\"
我对此很困惑,因为如果我希望经过训练的模型使用“.ilearner”作为模型的文件扩展名,那么文档中到处都是这样写的。
我看过this question asking about the same error leveraging the DataFactory, and also this question on datascience.stackexchange。两人都没有任何线索、答案或其他后续行动。
任何关于我遗漏的见解都将不胜感激!
任何正在寻找你的 "Don't Overthink It" 时刻的人:
我需要提供两个输出 blob 文件引用:
var request = new BatchExecutionRequest()
{
Inputs = new Dictionary<string, AzureBlobDataReference>() {
{
"input1",
new AzureBlobDataReference()
{
ConnectionString = _connectionString,
RelativeLocation = $"{_containerName}/{experimentId}/{tenantId}/{trainingDataFileName}.csv"
}
},
},
Outputs = new Dictionary<string, AzureBlobDataReference>() {
{
"output1",
new AzureBlobDataReference()
{
ConnectionString = _connectionString,
RelativeLocation = $"{_containerName}/{experimentId}/{tenantId}/{outputFileNameCsv}.csv"
}
},
{
"output2",
new AzureBlobDataReference()
{
ConnectionString = _connectionString,
RelativeLocation = $"{_containerName}/{experimentId}/{tenantId}/{outputFileNameIlearner}.ilearner"
}
},
},
GlobalParameters = new Dictionary<string, string>()
{
}
};
美式英语中有一句老话叫不做假设,我假设第二个输出是批处理操作中使用的可选参数。由于我实际上并没有从每次调用中寻找一个以上的结果,所以我认为删除第二个输出参数是安全的。
TL/DR:保留网络服务门户 "Consume" 选项卡生成的所有参数,并确保第一个参数是 .csv 文件引用。
我有一个 Azure webjob,它通过 HttpRequests 调用 ML 训练实验,利用 ML webportal 中生成的代码:
var request = new BatchExecutionRequest()
{
Inputs = new Dictionary<string, AzureBlobDataReference>() {
{
"input1",
new AzureBlobDataReference()
{
ConnectionString = _connectionString,
RelativeLocation = $"{_containerName}/{experimentId}/{tenantId}/{trainingDataFileName}"
}
},
},
Outputs = new Dictionary<string, AzureBlobDataReference>() {
{
"output1",
new AzureBlobDataReference()
{
ConnectionString = "azureStorageConnectionString",
RelativeLocation = $"{_containerName}/{experimentId}/{tenantId}/Model_2018421.ilearner"
}
},
},
GlobalParameters = new Dictionary<string, string>()
{
}
};
但是,请求失败并显示以下消息:
The blob reference: experiments/experimentId/TenantId/Model_2018421.ilearner has an invalid or missing file extension. Supported file extensions for this output type are: \".csv, .tsv, .arff\"
我对此很困惑,因为如果我希望经过训练的模型使用“.ilearner”作为模型的文件扩展名,那么文档中到处都是这样写的。
我看过this question asking about the same error leveraging the DataFactory, and also this question on datascience.stackexchange。两人都没有任何线索、答案或其他后续行动。
任何关于我遗漏的见解都将不胜感激!
任何正在寻找你的 "Don't Overthink It" 时刻的人:
我需要提供两个输出 blob 文件引用:
var request = new BatchExecutionRequest()
{
Inputs = new Dictionary<string, AzureBlobDataReference>() {
{
"input1",
new AzureBlobDataReference()
{
ConnectionString = _connectionString,
RelativeLocation = $"{_containerName}/{experimentId}/{tenantId}/{trainingDataFileName}.csv"
}
},
},
Outputs = new Dictionary<string, AzureBlobDataReference>() {
{
"output1",
new AzureBlobDataReference()
{
ConnectionString = _connectionString,
RelativeLocation = $"{_containerName}/{experimentId}/{tenantId}/{outputFileNameCsv}.csv"
}
},
{
"output2",
new AzureBlobDataReference()
{
ConnectionString = _connectionString,
RelativeLocation = $"{_containerName}/{experimentId}/{tenantId}/{outputFileNameIlearner}.ilearner"
}
},
},
GlobalParameters = new Dictionary<string, string>()
{
}
};
美式英语中有一句老话叫不做假设,我假设第二个输出是批处理操作中使用的可选参数。由于我实际上并没有从每次调用中寻找一个以上的结果,所以我认为删除第二个输出参数是安全的。
TL/DR:保留网络服务门户 "Consume" 选项卡生成的所有参数,并确保第一个参数是 .csv 文件引用。