ML.NET: 如何解决"Column with role MatrixColumnIndex should be a known cardinality U4 key, but is instead 'UInt32'"

ML.NET: How to solved "Column with role MatrixColumnIndex should be a known cardinality U4 key, but is instead 'UInt32'"

我正在尝试将以下 ML.NET F# Product Recommender 示例改编为我自己的用例:https://github.com/dotnet/machinelearning-samples/tree/master/samples/fsharp/getting-started/MatrixFactorization_ProductRecommendation

但是,在我的数据集中,我没有两个数字 ID。相反,我有一个 UserId(数字)和一个 ProductId(字符串)。因为键值似乎只能是数字,所以我尝试使用 MapValueToKey 函数对其进行映射。但是,我仍然收到以下错误:

Unhandled Exception: System.InvalidOperationException: Column 'UserId' with role MatrixColumnIndex should be a known cardinality U4 key, but is instead 'UInt32'
   at Microsoft.ML.Recommender.RecommenderUtils.CheckRowColumnType(RoleMappedData data, ColumnRole role, Column& col, Boolean isDecode)
   at Microsoft.ML.Recommender.RecommenderUtils.CheckAndGetMatrixIndexColumns(RoleMappedData data, Column& matrixColumnIndexColumn, Column& matrixRowIndexColumn, Boolean isDecode)
   at Microsoft.ML.Trainers.MatrixFactorizationTrainer.TrainCore(IChannel ch, RoleMappedData data, RoleMappedData validData)
   at Microsoft.ML.Trainers.MatrixFactorizationTrainer.Fit(IDataView trainData, IDataView validationData)
   at Microsoft.ML.Trainers.MatrixFactorizationTrainer.Fit(IDataView input)
   at <StartupCode$Recommender>.$Program.main@() in /Users/nat/Projects/Recommender/Recommender/Program.fs:line 75

我的数据架构类似于以下内容:

UserId,ProductId
1,test-product-id

这是失败的代码,改编自 linked 示例:

open Microsoft.ML
open Microsoft.ML.Data
open System
open Microsoft.ML.Trainers

[<CLIMutable>]
type ProductEntry = 
    {
        [<LoadColumn(0); KeyType(count=6248UL)>]
        UserId : uint32
        [<LoadColumn(1)>]
        ProductId : string
    }

[<CLIMutable>]
type Prediction = {Score : float32}

let trainDataPath = "/path/to/user_product_prediction.csv"

let mlContext = MLContext()

let pipeline = 
    mlContext.Transforms.Conversion.MapValueToKey(inputColumnName="ProductId",outputColumnName="ProductIdEncoded")

let traindata = mlContext.Data.LoadFromTextFile<ProductEntry>(trainDataPath, hasHeader=true, separatorChar=',')

let mappedDataView = pipeline.Fit(traindata).Transform(traindata)

let options = MatrixFactorizationTrainer.Options(MatrixColumnIndexColumnName = "UserId", 
                                                 MatrixRowIndexColumnName = "ProductIdEncoded",
                                                 LossFunction = MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass,
                                                 LabelColumnName = "ProductId",
                                                 Alpha = 0.01,
                                                 Lambda = 0.025)

let est = mlContext.Recommendation().Trainers.MatrixFactorization(options)

let model = est.Fit(mappedDataView)

let predictionengine = mlContext.Model.CreatePredictionEngine<ProductEntry, Prediction>(model)
let prediction = predictionengine.Predict {ProductId = "test-product-id"; UserId = 13854u}

printfn ""
printfn "For ProductID = 'test-product-id' and  ProductId = 13854 the predicted score is %f" prediction.Score
printf "=============== End of process, hit any key to finish ==============="
Console.ReadKey() |> ignore

我一直用作指导的另一个 link 是 https://medium.com/machinelearningadvantage/build-a-product-recommender-using-c-and-ml-net-machine-learning-ab890b802d25

几个小时以来,我一直在努力让它发挥作用。我到底做错了什么?


更新

通过使我的程序更类似于官方 .NET 示例,我设法取得了更进一步的进展。我现在得到的是:

open Microsoft.ML
open Microsoft.ML.Data
open System
open Microsoft.ML.Trainers

[<CLIMutable>]
type ProductEntry = 
    {
        [<LoadColumn(0); KeyType(count=6248UL)>]
        UserId : uint32
        [<LoadColumn(1)>]
        ProductId : string
        [<NoColumn>]
        Label : float32
    }

[<CLIMutable>]
type Prediction = {Score : float32}

let trainDataPath = "/Users/nat/Downloads/user_product_prediction.csv"

let mlContext = MLContext()

let pipeline = 
    EstimatorChain().Append(
        mlContext.Transforms.Conversion
            .MapValueToKey(inputColumnName="UserId",outputColumnName="UserIdEncoded"))
        .Append(
            mlContext.Transforms.Conversion
                .MapValueToKey(inputColumnName="ProductId",outputColumnName="ProductIdEncoded"))


let traindata =
    let columns = 
        [|
            TextLoader.Column("Label", DataKind.Single, 0)
            TextLoader.Column("UserId", DataKind.UInt32, source = [|TextLoader.Range(0)|], keyCount = KeyCount 6248UL) 
            TextLoader.Column("ProductId", DataKind.String, source = [|TextLoader.Range(1)|]) 
        |]
    mlContext.Data.LoadFromTextFile(trainDataPath, columns, hasHeader=true, separatorChar=',')

let mappedDataView = pipeline.Fit(traindata).Transform(traindata)

let options = MatrixFactorizationTrainer.Options(MatrixColumnIndexColumnName = "UserIdEncoded", 
                                                 MatrixRowIndexColumnName = "ProductIdEncoded",
                                                 LossFunction = MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass,
                                                 LabelColumnName = "Label",
                                                 Alpha = 0.01,
                                                 Lambda = 0.025)

let est = mlContext.Recommendation().Trainers.MatrixFactorization(options)

let model = est.Fit(mappedDataView)

let predictionengine = mlContext.Model.CreatePredictionEngine<ProductEntry, Prediction>(model)
let prediction = predictionengine.Predict {ProductId = "farfetch-13470673"; UserId = (uint32 13854); Label = 0.f}

printfn ""
printfn "For ProductID = 'farfetch-13470673' and  ProductId = 13854 the predicted score is %f" prediction.Score
printf "=============== End of process, hit any key to finish ==============="
Console.ReadKey() |> ignore

现在失败的地方是这一行: let predictionengine = mlContext.Model.CreatePredictionEngine<ProductEntry, Prediction>(model)

有错误

Unhandled Exception: System.ArgumentOutOfRangeException: UserIdEncoded column 'MatrixColumnIndex' not found
Parameter name: schema
   at Microsoft.ML.Data.RoleMappedSchema.MapFromNames(DataViewSchema schema, IEnumerable`1 roles, Boolean opt)
   at Microsoft.ML.Data.RoleMappedSchema..ctor(DataViewSchema schema, IEnumerable`1 roles, Boolean opt)
   at Microsoft.ML.Data.GenericScorer.Bindings.Create(IHostEnvironment env, ISchemaBindableMapper bindable, DataViewSchema input, IEnumerable`1 roles, String suffix, Boolean user)
   at Microsoft.ML.Data.GenericScorer.Bindings.ApplyToSchema(IHostEnvironment env, DataViewSchema input)
   at Microsoft.ML.Data.GenericScorer..ctor(IHostEnvironment env, GenericScorer transform, IDataView data)
   at Microsoft.ML.Data.GenericScorer.ApplyToDataCore(IHostEnvironment env, IDataView newSource)
   at Microsoft.ML.Data.RowToRowScorerBase.ApplyToData(IHostEnvironment env, IDataView newSource)
   at Microsoft.ML.Data.PredictionTransformerBase`1.Microsoft.ML.ITransformer.GetRowToRowMapper(DataViewSchema inputSchema)
   at Microsoft.ML.PredictionEngineBase`2..ctor(IHostEnvironment env, ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.PredictionEngine`2..ctor(IHostEnvironment env, ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.PredictionEngineExtensions.CreatePredictionEngine[TSrc,TDst](ITransformer transformer, IHostEnvironment env, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.ModelOperationsCatalog.CreatePredictionEngine[TSrc,TDst](ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)

我相信你已经过了最初的障碍:你成功地训练了模型,现在你需要assemble将所有训练好的资产输入预测引擎。

请注意,您有 两个 转换器 'trained':预处理管道(调用 pipeline.Fit(traindata) 的结果)和推荐器本身(调用 est.Fit(mappedDataView).

的结果

但是,您正在创建的预测引擎仅采用第二个转换器,因此只有在我们为其提供第一个转换器的输出时它才会起作用。

更好的方法是用预处理和推荐器形成一个估计器(对于可能的错误,我深表歉意,F# 不是我的母语):

let pipeline = 
    EstimatorChain().Append(
        mlContext.Transforms.Conversion
            .MapValueToKey(inputColumnName="UserId",outputColumnName="UserIdEncoded"))
        .Append(
            mlContext.Transforms.Conversion
                .MapValueToKey(inputColumnName="ProductId",outputColumnName="ProductIdEncoded"))


let traindata =
    let columns = 
        [|
            TextLoader.Column("Label", DataKind.Single, 0)
            TextLoader.Column("UserId", DataKind.UInt32, source = [|TextLoader.Range(0)|], keyCount = KeyCount 6248UL) 
            TextLoader.Column("ProductId", DataKind.String, source = [|TextLoader.Range(1)|]) 
        |]
    mlContext.Data.LoadFromTextFile(trainDataPath, columns, hasHeader=true, separatorChar=',')

// No need to do it: 
// let mappedDataView = pipeline.Fit(traindata).Transform(traindata)

let options = MatrixFactorizationTrainer.Options(MatrixColumnIndexColumnName = "UserIdEncoded", 
                                                 MatrixRowIndexColumnName = "ProductIdEncoded",
                                                 LossFunction = MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass,
                                                 LabelColumnName = "Label",
                                                 Alpha = 0.01,
                                                 Lambda = 0.025)

// Rather than this:
// let est = mlContext.Recommendation().Trainers.MatrixFactorization(options)
// Do this:
let est = pipeline.Append( mlContext.Recommendation().Trainers.MatrixFactorization(options));

// Now train the whole pipeline.
let model = est.Fit(traindata)

// The rest should now work.
let predictionengine = mlContext.Model.CreatePredictionEngine<ProductEntry, Prediction>(model)
let prediction = predictionengine.Predict {ProductId = "farfetch-13470673"; UserId = (uint32 13854); Label = 0.f}