将 "data" 从新的文本加载器添加到现有管道(版本 0.6)
Adding "data" from new TextLoader into exisitng Pipeline (ver. 0.6)
我正在尝试将我的模型从 ML.NET 0.5 改进到 0.6,但我有一个问题。
我从 ML.NET Cookbook 复制粘贴示例说:
// Create a new environment for ML.NET operations. It can be used for
exception tracking and logging,
// as well as the source of randomness.
var env = new LocalEnvironment();
// Create the reader: define the data columns and where to find them in the
text file.
var reader = TextLoader.CreateReader(env, ctx => (
// We read the first 11 values as a single float vector.
FeatureVector: ctx.LoadFloat(0, 10),
// Separately, read the target variable.
Target: ctx.LoadFloat(11)
),
// Default separator is tab, but we need a comma.
separator: ',');
// Now read the file (remember though, readers are lazy, so the actual
reading will happen when the data is accessed).
var data = reader.Read(new MultiFileSource(dataPath));
所以我开始将它应用到我的模型中:
using System;
using Microsoft.ML.Legacy;
using Microsoft.ML.Legacy.Data;
using Microsoft.ML.Legacy.Transforms;
using Microsoft.ML.Legacy.Trainers;
using Microsoft.ML.Legacy.Models;
using Microsoft.ML.Runtime.Data;
public static PredictionModel<CancerData, CancerPrediction> Train()
{
var pipeline = new LearningPipeline();
//0.6 way to upload data into model
var env = new LocalEnvironment();
var reader = Microsoft.ML.Runtime.Data.TextLoader.CreateReader(env, ctx => (
FeatureVector: ctx.LoadFloat(0, 30),
Target: ctx.LoadText(31)
),
separator: ';');
var data = reader.Read(new MultiFileSource("Cancer-Train.csv"));
//pipeline.Add(new TextLoader("Cancer-Train.csv").CreateFrom<CancerData>(useHeader: true, separator: ';'));
pipeline.Add(new Dictionarizer(("Diagnosis", "Label")));
pipeline.Add(data); //dont work, i just write it to show you what i want to do
//below the 0.5 way to load data into pipeline!
//pipeline.Add(new ColumnConcatenator(outputColumn: "Features",
// "RadiusMean",
// "TextureMean",
// .. and so on...
// "SymmetryWorst",
// "FractalDimensionWorst"));
pipeline.Add(new StochasticDualCoordinateAscentBinaryClassifier());
pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });
PredictionModel<CancerData, CancerPrediction> model = pipeline.Train<CancerData, CancerPrediction>();
model.WriteAsync(modelPath);
return model;
}
问题是,如何将var data
添加到我现有的pipeline
中?我需要做什么,从版本 0.6 到 var data
适用于 0.5 pipeline
?
我认为 LearningPipeline
API 与新的静态类型 API(例如 TextLoader.CreateReader
)不兼容。 cookbook helps to show the new APIs for training and also other scenarios like using the model for predictions. This 测试也可能有助于二进制分类。
对于您的具体代码,我相信训练代码应该类似于:
var env = new LocalEnvironment();
var reader = Microsoft.ML.Runtime.Data.TextLoader.CreateReader(env, ctx => (
FeatureVector: ctx.LoadFloat(0, 30),
Target: ctx.LoadBool(31)
),
separator: ';');
var data = reader.Read(new MultiFileSource("Cancer-Train.csv"));
BinaryClassificationContext bcc = new BinaryClassificationContext(env);
var estimator = reader.MakeNewEstimator()
.Append(row => (
label: row.Target,
features: row.FeatureVector.Normalize()))
.Append(row => (
row.label,
score: bcc.Trainers.Sdca(row.label, row.features)))
.Append(row => (
row.label,
row.score,
predictedLabel: row.score.predictedLabel));
var model = estimator.Fit(data);
我正在尝试将我的模型从 ML.NET 0.5 改进到 0.6,但我有一个问题。
我从 ML.NET Cookbook 复制粘贴示例说:
// Create a new environment for ML.NET operations. It can be used for
exception tracking and logging,
// as well as the source of randomness.
var env = new LocalEnvironment();
// Create the reader: define the data columns and where to find them in the
text file.
var reader = TextLoader.CreateReader(env, ctx => (
// We read the first 11 values as a single float vector.
FeatureVector: ctx.LoadFloat(0, 10),
// Separately, read the target variable.
Target: ctx.LoadFloat(11)
),
// Default separator is tab, but we need a comma.
separator: ',');
// Now read the file (remember though, readers are lazy, so the actual
reading will happen when the data is accessed).
var data = reader.Read(new MultiFileSource(dataPath));
所以我开始将它应用到我的模型中:
using System;
using Microsoft.ML.Legacy;
using Microsoft.ML.Legacy.Data;
using Microsoft.ML.Legacy.Transforms;
using Microsoft.ML.Legacy.Trainers;
using Microsoft.ML.Legacy.Models;
using Microsoft.ML.Runtime.Data;
public static PredictionModel<CancerData, CancerPrediction> Train()
{
var pipeline = new LearningPipeline();
//0.6 way to upload data into model
var env = new LocalEnvironment();
var reader = Microsoft.ML.Runtime.Data.TextLoader.CreateReader(env, ctx => (
FeatureVector: ctx.LoadFloat(0, 30),
Target: ctx.LoadText(31)
),
separator: ';');
var data = reader.Read(new MultiFileSource("Cancer-Train.csv"));
//pipeline.Add(new TextLoader("Cancer-Train.csv").CreateFrom<CancerData>(useHeader: true, separator: ';'));
pipeline.Add(new Dictionarizer(("Diagnosis", "Label")));
pipeline.Add(data); //dont work, i just write it to show you what i want to do
//below the 0.5 way to load data into pipeline!
//pipeline.Add(new ColumnConcatenator(outputColumn: "Features",
// "RadiusMean",
// "TextureMean",
// .. and so on...
// "SymmetryWorst",
// "FractalDimensionWorst"));
pipeline.Add(new StochasticDualCoordinateAscentBinaryClassifier());
pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });
PredictionModel<CancerData, CancerPrediction> model = pipeline.Train<CancerData, CancerPrediction>();
model.WriteAsync(modelPath);
return model;
}
问题是,如何将var data
添加到我现有的pipeline
中?我需要做什么,从版本 0.6 到 var data
适用于 0.5 pipeline
?
我认为 LearningPipeline
API 与新的静态类型 API(例如 TextLoader.CreateReader
)不兼容。 cookbook helps to show the new APIs for training and also other scenarios like using the model for predictions. This 测试也可能有助于二进制分类。
对于您的具体代码,我相信训练代码应该类似于:
var env = new LocalEnvironment();
var reader = Microsoft.ML.Runtime.Data.TextLoader.CreateReader(env, ctx => (
FeatureVector: ctx.LoadFloat(0, 30),
Target: ctx.LoadBool(31)
),
separator: ';');
var data = reader.Read(new MultiFileSource("Cancer-Train.csv"));
BinaryClassificationContext bcc = new BinaryClassificationContext(env);
var estimator = reader.MakeNewEstimator()
.Append(row => (
label: row.Target,
features: row.FeatureVector.Normalize()))
.Append(row => (
row.label,
score: bcc.Trainers.Sdca(row.label, row.features)))
.Append(row => (
row.label,
row.score,
predictedLabel: row.score.predictedLabel));
var model = estimator.Fit(data);