Accord.NET multiclass SVM分类Kernel如何解决Out of memory异常
Accord.NET multiclass SVM classification Kernel how to solve Out of memory exception
我想使用 nursery data to train SVM (8 attributes and 5 classes), using same logic for C45 Learning class as seen on example:
例如,数据是从包含 8 个属性的苗圃数据加载的 "parents", "has_nurs", "form", "children", "housing", "finance", "social", "health"
这些属性的组合会产生 5 类 "not_recom","recommend", "very_recom","priority","spec_prior"
之一
但是我不知道 Kernel 最适合这种 SVM 数据。根据定义,多项式核是一个核函数,表示特征 space 中向量(训练样本)与原始变量多项式的相似性,允许学习非线性模型。
我尝试使用此内核,但在使用数据训练机器时遇到问题。
到目前为止,我使用示例中显示的代码来训练 SVM,并使用如下 svm 代码:
#//same code as C45 Example to get input and output data
string nurseryData = Resources.nursery;
string[] inputColumns =
{
“parents”, “has_nurs”, “form”, “children”,
“housing”, “finance”, “social”, “health”
};
string outputColumn = “output”;
DataTable table = new DataTable(“Nursery”);
table.Columns.Add(inputColumns);
table.Columns.Add(outputColumn);
string[] lines = nurseryData.Split(
new[] { Environment.NewLine }, StringSplitOptions.None);
foreach (var line in lines)
table.Rows.Add(line.Split(‘,’));
Codification codebook = new Codification(table);
DataTable symbols = codebook.Apply(table);
double[][] inputs = symbols.ToArray(inputColumns);
int[] outputs = symbols.ToArray(outputColumn);
int inputDimension = 8;
int outputClasses = 5;
#//SVM
IKernel kernel = new Polynomial(2, 5);
// Create the Multi-class Support Vector Machine using the selected Kernel
var ksvm = new MulticlassSupportVectorMachine(inputDimension, kernel, outputClasses);
// Create the learning algorithm using the machine and the training data
var ml = new MulticlassSupportVectorLearning(ksvm, inputs, outputs);
ml.Algorithm = (svm, classInputs, classOutputs, i, j) =>
new SequentialMinimalOptimization(svm, classInputs, classOutputs);
double SVMerror = ml.Run();
但是我在训练机器时出错,我错过了什么?
编辑
我现在有其他问题,尝试 Cesar 的代码我得到了这个
该框架自动构建内核函数缓存,以帮助在 SVM 学习期间加快计算速度。但是,在某些情况下,此缓存可能会占用过多内存并导致 OutOfMemoryExceptions。
要在内存消耗和 CPU 速度之间取得平衡,请设置 CacheSize property to a lower value。默认是将所有输入向量存储在缓存中;将其设置为较低的值(例如训练样本数的 1/20)就足够了。
如果您将 CacheSize 设置为零,那么您将完全禁用缓存。训练可能会慢一点,但你不会有任何记忆问题。请看下面的代码。我得到的结果误差约为 0.09。
// same code to get input and output data
string nurseryData = Properties.Resources.nursery;
string[] inputColumns =
{
"parents", "has_nurs", "form", "children",
"housing", "finance", "social", "health"
};
string outputColumn = "output";
DataTable table = new DataTable("Nursery");
table.Columns.Add(inputColumns);
table.Columns.Add(outputColumn);
string[] lines = nurseryData.Split(
new[] { Environment.NewLine }, StringSplitOptions.None);
foreach (var line in lines)
table.Rows.Add(line.Split(','));
Codification codebook = new Codification(table);
DataTable symbols = codebook.Apply(table);
double[][] inputs = symbols.ToArray(inputColumns);
int[] outputs = Matrix.ToArray<int>(symbols, outputColumn);
//SVM
IKernel kernel = new Linear();
// Create the Multi-class Support Vector Machine using the selected Kernel
int inputDimension = inputs[0].Length;
int outputClasses = codebook[outputColumn].Symbols;
var ksvm = new MulticlassSupportVectorMachine(inputDimension, kernel, outputClasses);
// Create the learning algorithm using the machine and the training data
var ml = new MulticlassSupportVectorLearning(ksvm, inputs, outputs)
{
Algorithm = (svm, classInputs, classOutputs, i, j) =>
{
return new SequentialMinimalOptimization(svm, classInputs, classOutputs)
{
CacheSize = 0
};
}
};
double SVMerror = ml.Run(); // should be around 0.09
不过,我同意这可能不太明显。我将在修复版本中添加一种更好的方法来处理这种情况。感谢您提出问题!
我想使用 nursery data to train SVM (8 attributes and 5 classes), using same logic for C45 Learning class as seen on example:
例如,数据是从包含 8 个属性的苗圃数据加载的 "parents", "has_nurs", "form", "children", "housing", "finance", "social", "health"
这些属性的组合会产生 5 类 "not_recom","recommend", "very_recom","priority","spec_prior"
但是我不知道 Kernel 最适合这种 SVM 数据。根据定义,多项式核是一个核函数,表示特征 space 中向量(训练样本)与原始变量多项式的相似性,允许学习非线性模型。 我尝试使用此内核,但在使用数据训练机器时遇到问题。
到目前为止,我使用示例中显示的代码来训练 SVM,并使用如下 svm 代码:
#//same code as C45 Example to get input and output data
string nurseryData = Resources.nursery;
string[] inputColumns =
{
“parents”, “has_nurs”, “form”, “children”,
“housing”, “finance”, “social”, “health”
};
string outputColumn = “output”;
DataTable table = new DataTable(“Nursery”);
table.Columns.Add(inputColumns);
table.Columns.Add(outputColumn);
string[] lines = nurseryData.Split(
new[] { Environment.NewLine }, StringSplitOptions.None);
foreach (var line in lines)
table.Rows.Add(line.Split(‘,’));
Codification codebook = new Codification(table);
DataTable symbols = codebook.Apply(table);
double[][] inputs = symbols.ToArray(inputColumns);
int[] outputs = symbols.ToArray(outputColumn);
int inputDimension = 8;
int outputClasses = 5;
#//SVM
IKernel kernel = new Polynomial(2, 5);
// Create the Multi-class Support Vector Machine using the selected Kernel
var ksvm = new MulticlassSupportVectorMachine(inputDimension, kernel, outputClasses);
// Create the learning algorithm using the machine and the training data
var ml = new MulticlassSupportVectorLearning(ksvm, inputs, outputs);
ml.Algorithm = (svm, classInputs, classOutputs, i, j) =>
new SequentialMinimalOptimization(svm, classInputs, classOutputs);
double SVMerror = ml.Run();
但是我在训练机器时出错,我错过了什么?
编辑
我现在有其他问题,尝试 Cesar 的代码我得到了这个
该框架自动构建内核函数缓存,以帮助在 SVM 学习期间加快计算速度。但是,在某些情况下,此缓存可能会占用过多内存并导致 OutOfMemoryExceptions。
要在内存消耗和 CPU 速度之间取得平衡,请设置 CacheSize property to a lower value。默认是将所有输入向量存储在缓存中;将其设置为较低的值(例如训练样本数的 1/20)就足够了。
如果您将 CacheSize 设置为零,那么您将完全禁用缓存。训练可能会慢一点,但你不会有任何记忆问题。请看下面的代码。我得到的结果误差约为 0.09。
// same code to get input and output data
string nurseryData = Properties.Resources.nursery;
string[] inputColumns =
{
"parents", "has_nurs", "form", "children",
"housing", "finance", "social", "health"
};
string outputColumn = "output";
DataTable table = new DataTable("Nursery");
table.Columns.Add(inputColumns);
table.Columns.Add(outputColumn);
string[] lines = nurseryData.Split(
new[] { Environment.NewLine }, StringSplitOptions.None);
foreach (var line in lines)
table.Rows.Add(line.Split(','));
Codification codebook = new Codification(table);
DataTable symbols = codebook.Apply(table);
double[][] inputs = symbols.ToArray(inputColumns);
int[] outputs = Matrix.ToArray<int>(symbols, outputColumn);
//SVM
IKernel kernel = new Linear();
// Create the Multi-class Support Vector Machine using the selected Kernel
int inputDimension = inputs[0].Length;
int outputClasses = codebook[outputColumn].Symbols;
var ksvm = new MulticlassSupportVectorMachine(inputDimension, kernel, outputClasses);
// Create the learning algorithm using the machine and the training data
var ml = new MulticlassSupportVectorLearning(ksvm, inputs, outputs)
{
Algorithm = (svm, classInputs, classOutputs, i, j) =>
{
return new SequentialMinimalOptimization(svm, classInputs, classOutputs)
{
CacheSize = 0
};
}
};
double SVMerror = ml.Run(); // should be around 0.09
不过,我同意这可能不太明显。我将在修复版本中添加一种更好的方法来处理这种情况。感谢您提出问题!