将 "data" 从新的文本加载器添加到现有管道(版本 0.6)

Adding "data" from new TextLoader into exisitng Pipeline (ver. 0.6)

我正在尝试将我的模型从 ML.NET 0.5 改进到 0.6,但我有一个问题。

我从 ML.NET Cookbook 复制粘贴示例说:

// Create a new environment for ML.NET operations. It can be used for 
exception tracking and logging, 
// as well as the source of randomness.
var env = new LocalEnvironment();

// Create the reader: define the data columns and where to find them in the 
 text file.
 var reader = TextLoader.CreateReader(env, ctx => (
    // We read the first 11 values as a single float vector.
    FeatureVector: ctx.LoadFloat(0, 10),
    // Separately, read the target variable.
    Target: ctx.LoadFloat(11)
    ),
    // Default separator is tab, but we need a comma.
    separator: ',');


// Now read the file (remember though, readers are lazy, so the actual 
reading will happen when the data is accessed).
var data = reader.Read(new MultiFileSource(dataPath));

所以我开始将它应用到我的模型中:

using System;
using Microsoft.ML.Legacy;
using Microsoft.ML.Legacy.Data;
using Microsoft.ML.Legacy.Transforms;
using Microsoft.ML.Legacy.Trainers;
using Microsoft.ML.Legacy.Models;
using Microsoft.ML.Runtime.Data;

public static PredictionModel<CancerData, CancerPrediction> Train()
    {
        var pipeline = new LearningPipeline();
        //0.6 way to upload data into model
        var env = new LocalEnvironment();
            var reader = Microsoft.ML.Runtime.Data.TextLoader.CreateReader(env, ctx => (
            FeatureVector: ctx.LoadFloat(0, 30),
            Target: ctx.LoadText(31)
                ),
            separator: ';');

        var data = reader.Read(new MultiFileSource("Cancer-Train.csv"));

        //pipeline.Add(new TextLoader("Cancer-Train.csv").CreateFrom<CancerData>(useHeader: true, separator: ';'));
        pipeline.Add(new Dictionarizer(("Diagnosis", "Label")));
        pipeline.Add(data); //dont work, i just write it to show you what i want to do

        //below the 0.5 way to load data into pipeline!
        //pipeline.Add(new ColumnConcatenator(outputColumn: "Features",
        //    "RadiusMean",
        //    "TextureMean",
        // .. and so on...
        //    "SymmetryWorst",
        //    "FractalDimensionWorst"));
        pipeline.Add(new StochasticDualCoordinateAscentBinaryClassifier());
        pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });
        PredictionModel<CancerData, CancerPrediction> model = pipeline.Train<CancerData, CancerPrediction>();

        model.WriteAsync(modelPath);
        return model;

    }

问题是,如何将var data添加到我现有的pipeline中?我需要做什么,从版本 0.6 到 var data 适用于 0.5 pipeline?

我认为 LearningPipeline API 与新的静态类型 API(例如 TextLoader.CreateReader)不兼容。 cookbook helps to show the new APIs for training and also other scenarios like using the model for predictions. This 测试也可能有助于二进制分类。

对于您的具体代码,我相信训练代码应该类似于:

var env = new LocalEnvironment();
var reader = Microsoft.ML.Runtime.Data.TextLoader.CreateReader(env, ctx => (
FeatureVector: ctx.LoadFloat(0, 30),
Target: ctx.LoadBool(31)
    ),
separator: ';');

var data = reader.Read(new MultiFileSource("Cancer-Train.csv"));

BinaryClassificationContext bcc = new BinaryClassificationContext(env);

var estimator = reader.MakeNewEstimator()
    .Append(row => (
        label: row.Target,
        features: row.FeatureVector.Normalize()))
    .Append(row => (
        row.label,
        score: bcc.Trainers.Sdca(row.label, row.features)))
    .Append(row => (
        row.label,
        row.score,
        predictedLabel: row.score.predictedLabel));

var model = estimator.Fit(data);