多类分类标签为字符串类型时出错

Error when multiclass classification label is type string

我刚开始使用 ML.Net,发现自己对 API 的快速发展和基于各种 API 版本的示例感到困惑。

我的目标是读入多个数字特征列和一个指定标签的文本列 ("Brand"),但我在该代码段的最后一行出现错误

var trainingDataView = mlContext.Data.ReadFromTextFile<PurchaseData>
    (path: trainDataPath, hasHeader: true, separatorChar: ',');

var dataProcessPipeline = mlContext.Transforms
    .Concatenate(DefaultColumnNames.Features,
                                nameof(PurchaseData.AgeBracket),
                                nameof(PurchaseData.Gender),
                                nameof(PurchaseData.IncomeBracket),
                                )                               
    .Append(mlContext.Transforms.CopyColumns("Label", nameof(PurchaseData.Brand)))
    .AppendCacheCheckpoint(mlContext);

var trainer = mlContext.MulticlassClassification.Trainers
    .StochasticDualCoordinateAscent(featureColumn: DefaultColumnNames.Features);
var trainingPipeline = dataProcessPipeline.Append(trainer);

var trainedModel = trainingPipeline.Fit(trainingDataView);

'Schema mismatch for label column 'Label': expected float, double or KeyType, got Text'

为什么标签不是 expected/allowed 文本,我该如何解决?

您需要将 Label 转换为键类型,算法需要数字作为输入。 代替: .Append(mlContext.Transforms.CopyColumns("Label", nameof(PurchaseData.Brand)))

与:

mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: DefaultColumnNames.Label,inputColumnName:nameof(PurchaseData.Brand))

看看这个,举个例子: https://github.com/dotnet/machinelearning-samples/blob/master/samples/csharp/end-to-end-apps/MulticlassClassification-GitHubLabeler/GitHubLabeler/GitHubLabelerConsoleApp/Program.cs