简单 accord.net 机器学习示例
Simple accord.net machine learning example
我是机器学习的新手,也是 accord.net
的新手(我编写 C#)。
我想创建一个简单的项目,在其中查看一个简单的波动数据时间序列,然后我想 accord.net 学习它并预测下一个值。
这是数据(时间序列)的样子:
X - Y
1 - 1
2 - 2
3 - 3
4 - 2
5 - 1
6 - 2
7 - 3
8 - 2
9 - 1
然后我希望它预测以下内容:
X - Y
10 - 2
11 - 3
12 - 2
13 - 1
14 - 2
15 - 3
你们能帮我举一些例子来解决这个问题吗?
一个简单的方法是使用 Accord ID3 决策树。
诀窍是找出要使用的输入 - 你不能只在 X 上训练 - 树不会从中学到任何关于 X 未来值的信息 - 但是你可以构建一些从 X 派生的特征(或以前的 Y 值),这将是有用的。
通常对于这样的问题 - 你会根据 Y(被预测的事物)的先前值而不是 X 的先前值派生的特征进行每个预测。但是,假设你可以在每个预测之间顺序观察 Y(你可以' t 然后预测任何任意 X) 所以我会坚持提出的问题。
我尝试构建一个 Accord ID3 决策树来解决下面的这个问题。我使用了 x % n
的几个不同值作为特征 - 希望树可以从中得出答案。事实上,如果我添加 (x-1) % 4
作为一个特性,它可以在一个单一的层次上只用那个属性来完成——但我想重点更多的是让树找到模式。
下面是代码:
// this is the sequence y follows
int[] ysequence = new int[] { 1, 2, 3, 2 };
// this generates the correct Y for a given X
int CalcY(int x) => ysequence[(x - 1) % 4];
// this generates some inputs - just a few differnt mod of x
int[] CalcInputs(int x) => new int[] { x % 2, x % 3, x % 4, x % 5, x % 6 };
// for
[TestMethod]
public void AccordID3TestWhosebugQuestion2()
{
// build the training data set
int numtrainingcases = 12;
int[][] inputs = new int[numtrainingcases][];
int[] outputs = new int[numtrainingcases];
Console.WriteLine("\t\t\t\t x \t y");
for (int x = 1; x <= numtrainingcases; x++)
{
int y = CalcY(x);
inputs[x-1] = CalcInputs(x);
outputs[x-1] = y;
Console.WriteLine("TrainingData \t " +x+"\t "+y);
}
// define how many values each input can have
DecisionVariable[] attributes =
{
new DecisionVariable("Mod2",2),
new DecisionVariable("Mod3",3),
new DecisionVariable("Mod4",4),
new DecisionVariable("Mod5",5),
new DecisionVariable("Mod6",6)
};
// define how many outputs (+1 only because y doesn't use zero)
int classCount = outputs.Max()+1;
// create the tree
DecisionTree tree = new DecisionTree(attributes, classCount);
// Create a new instance of the ID3 algorithm
ID3Learning id3learning = new ID3Learning(tree);
// Learn the training instances! Populates the tree
id3learning.Learn(inputs, outputs);
Console.WriteLine();
// now try to predict some cases that werent in the training data
for (int x = numtrainingcases+1; x <= 2* numtrainingcases; x++)
{
int[] query = CalcInputs(x);
int answer = tree.Decide(query); // makes the prediction
Assert.AreEqual(CalcY(x), answer); // check the answer is what we expected - ie the tree got it right
Console.WriteLine("Prediction \t\t " + x+"\t "+answer);
}
}
这是它产生的输出:
x y
TrainingData 1 1
TrainingData 2 2
TrainingData 3 3
TrainingData 4 2
TrainingData 5 1
TrainingData 6 2
TrainingData 7 3
TrainingData 8 2
TrainingData 9 1
TrainingData 10 2
TrainingData 11 3
TrainingData 12 2
Prediction 13 1
Prediction 14 2
Prediction 15 3
Prediction 16 2
Prediction 17 1
Prediction 18 2
Prediction 19 3
Prediction 20 2
Prediction 21 1
Prediction 22 2
Prediction 23 3
Prediction 24 2
希望对您有所帮助。
编辑:根据评论,下面的示例被修改为训练目标 (Y) 的先前值 - 而不是从时间索引 (X) 派生的特征。这意味着您不能在系列开始时开始训练 - 因为您需要 Y 的先前值的回溯历史。在此示例中,我从 x=9 开始只是因为它保持相同的顺序。
// this is the sequence y follows
int[] ysequence = new int[] { 1, 2, 3, 2 };
// this generates the correct Y for a given X
int CalcY(int x) => ysequence[(x - 1) % 4];
// this generates some inputs - just a few differnt mod of x
int[] CalcInputs(int x) => new int[] { CalcY(x-1), CalcY(x-2), CalcY(x-3), CalcY(x-4), CalcY(x - 5) };
//int[] CalcInputs(int x) => new int[] { x % 2, x % 3, x % 4, x % 5, x % 6 };
// for
[TestMethod]
public void AccordID3TestTestWhosebugQuestion2()
{
// build the training data set
int numtrainingcases = 12;
int starttrainingat = 9;
int[][] inputs = new int[numtrainingcases][];
int[] outputs = new int[numtrainingcases];
Console.WriteLine("\t\t\t\t x \t y");
for (int x = starttrainingat; x < numtrainingcases + starttrainingat; x++)
{
int y = CalcY(x);
inputs[x- starttrainingat] = CalcInputs(x);
outputs[x- starttrainingat] = y;
Console.WriteLine("TrainingData \t " +x+"\t "+y);
}
// define how many values each input can have
DecisionVariable[] attributes =
{
new DecisionVariable("y-1",4),
new DecisionVariable("y-2",4),
new DecisionVariable("y-3",4),
new DecisionVariable("y-4",4),
new DecisionVariable("y-5",4)
};
// define how many outputs (+1 only because y doesn't use zero)
int classCount = outputs.Max()+1;
// create the tree
DecisionTree tree = new DecisionTree(attributes, classCount);
// Create a new instance of the ID3 algorithm
ID3Learning id3learning = new ID3Learning(tree);
// Learn the training instances! Populates the tree
id3learning.Learn(inputs, outputs);
Console.WriteLine();
// now try to predict some cases that werent in the training data
for (int x = starttrainingat+numtrainingcases; x <= starttrainingat + 2 * numtrainingcases; x++)
{
int[] query = CalcInputs(x);
int answer = tree.Decide(query); // makes the prediction
Assert.AreEqual(CalcY(x), answer); // check the answer is what we expected - ie the tree got it right
Console.WriteLine("Prediction \t\t " + x+"\t "+answer);
}
}
您还可以考虑对 Y 的先前值之间的差异进行训练 - 在 Y 的绝对值不如相对变化重要的情况下,这会更好。
我是机器学习的新手,也是 accord.net
的新手(我编写 C#)。
我想创建一个简单的项目,在其中查看一个简单的波动数据时间序列,然后我想 accord.net 学习它并预测下一个值。
这是数据(时间序列)的样子:
X - Y
1 - 1
2 - 2
3 - 3
4 - 2
5 - 1
6 - 2
7 - 3
8 - 2
9 - 1
然后我希望它预测以下内容:
X - Y
10 - 2
11 - 3
12 - 2
13 - 1
14 - 2
15 - 3
你们能帮我举一些例子来解决这个问题吗?
一个简单的方法是使用 Accord ID3 决策树。
诀窍是找出要使用的输入 - 你不能只在 X 上训练 - 树不会从中学到任何关于 X 未来值的信息 - 但是你可以构建一些从 X 派生的特征(或以前的 Y 值),这将是有用的。
通常对于这样的问题 - 你会根据 Y(被预测的事物)的先前值而不是 X 的先前值派生的特征进行每个预测。但是,假设你可以在每个预测之间顺序观察 Y(你可以' t 然后预测任何任意 X) 所以我会坚持提出的问题。
我尝试构建一个 Accord ID3 决策树来解决下面的这个问题。我使用了 x % n
的几个不同值作为特征 - 希望树可以从中得出答案。事实上,如果我添加 (x-1) % 4
作为一个特性,它可以在一个单一的层次上只用那个属性来完成——但我想重点更多的是让树找到模式。
下面是代码:
// this is the sequence y follows
int[] ysequence = new int[] { 1, 2, 3, 2 };
// this generates the correct Y for a given X
int CalcY(int x) => ysequence[(x - 1) % 4];
// this generates some inputs - just a few differnt mod of x
int[] CalcInputs(int x) => new int[] { x % 2, x % 3, x % 4, x % 5, x % 6 };
// for
[TestMethod]
public void AccordID3TestWhosebugQuestion2()
{
// build the training data set
int numtrainingcases = 12;
int[][] inputs = new int[numtrainingcases][];
int[] outputs = new int[numtrainingcases];
Console.WriteLine("\t\t\t\t x \t y");
for (int x = 1; x <= numtrainingcases; x++)
{
int y = CalcY(x);
inputs[x-1] = CalcInputs(x);
outputs[x-1] = y;
Console.WriteLine("TrainingData \t " +x+"\t "+y);
}
// define how many values each input can have
DecisionVariable[] attributes =
{
new DecisionVariable("Mod2",2),
new DecisionVariable("Mod3",3),
new DecisionVariable("Mod4",4),
new DecisionVariable("Mod5",5),
new DecisionVariable("Mod6",6)
};
// define how many outputs (+1 only because y doesn't use zero)
int classCount = outputs.Max()+1;
// create the tree
DecisionTree tree = new DecisionTree(attributes, classCount);
// Create a new instance of the ID3 algorithm
ID3Learning id3learning = new ID3Learning(tree);
// Learn the training instances! Populates the tree
id3learning.Learn(inputs, outputs);
Console.WriteLine();
// now try to predict some cases that werent in the training data
for (int x = numtrainingcases+1; x <= 2* numtrainingcases; x++)
{
int[] query = CalcInputs(x);
int answer = tree.Decide(query); // makes the prediction
Assert.AreEqual(CalcY(x), answer); // check the answer is what we expected - ie the tree got it right
Console.WriteLine("Prediction \t\t " + x+"\t "+answer);
}
}
这是它产生的输出:
x y
TrainingData 1 1
TrainingData 2 2
TrainingData 3 3
TrainingData 4 2
TrainingData 5 1
TrainingData 6 2
TrainingData 7 3
TrainingData 8 2
TrainingData 9 1
TrainingData 10 2
TrainingData 11 3
TrainingData 12 2
Prediction 13 1
Prediction 14 2
Prediction 15 3
Prediction 16 2
Prediction 17 1
Prediction 18 2
Prediction 19 3
Prediction 20 2
Prediction 21 1
Prediction 22 2
Prediction 23 3
Prediction 24 2
希望对您有所帮助。
编辑:根据评论,下面的示例被修改为训练目标 (Y) 的先前值 - 而不是从时间索引 (X) 派生的特征。这意味着您不能在系列开始时开始训练 - 因为您需要 Y 的先前值的回溯历史。在此示例中,我从 x=9 开始只是因为它保持相同的顺序。
// this is the sequence y follows
int[] ysequence = new int[] { 1, 2, 3, 2 };
// this generates the correct Y for a given X
int CalcY(int x) => ysequence[(x - 1) % 4];
// this generates some inputs - just a few differnt mod of x
int[] CalcInputs(int x) => new int[] { CalcY(x-1), CalcY(x-2), CalcY(x-3), CalcY(x-4), CalcY(x - 5) };
//int[] CalcInputs(int x) => new int[] { x % 2, x % 3, x % 4, x % 5, x % 6 };
// for
[TestMethod]
public void AccordID3TestTestWhosebugQuestion2()
{
// build the training data set
int numtrainingcases = 12;
int starttrainingat = 9;
int[][] inputs = new int[numtrainingcases][];
int[] outputs = new int[numtrainingcases];
Console.WriteLine("\t\t\t\t x \t y");
for (int x = starttrainingat; x < numtrainingcases + starttrainingat; x++)
{
int y = CalcY(x);
inputs[x- starttrainingat] = CalcInputs(x);
outputs[x- starttrainingat] = y;
Console.WriteLine("TrainingData \t " +x+"\t "+y);
}
// define how many values each input can have
DecisionVariable[] attributes =
{
new DecisionVariable("y-1",4),
new DecisionVariable("y-2",4),
new DecisionVariable("y-3",4),
new DecisionVariable("y-4",4),
new DecisionVariable("y-5",4)
};
// define how many outputs (+1 only because y doesn't use zero)
int classCount = outputs.Max()+1;
// create the tree
DecisionTree tree = new DecisionTree(attributes, classCount);
// Create a new instance of the ID3 algorithm
ID3Learning id3learning = new ID3Learning(tree);
// Learn the training instances! Populates the tree
id3learning.Learn(inputs, outputs);
Console.WriteLine();
// now try to predict some cases that werent in the training data
for (int x = starttrainingat+numtrainingcases; x <= starttrainingat + 2 * numtrainingcases; x++)
{
int[] query = CalcInputs(x);
int answer = tree.Decide(query); // makes the prediction
Assert.AreEqual(CalcY(x), answer); // check the answer is what we expected - ie the tree got it right
Console.WriteLine("Prediction \t\t " + x+"\t "+answer);
}
}
您还可以考虑对 Y 的先前值之间的差异进行训练 - 在 Y 的绝对值不如相对变化重要的情况下,这会更好。