RWeka 的规则学习算法,查找有关日期的规则的问题
Rule learning algorithms of RWeka, problems with finding rules concerning dates
我对 R 的 RWeka 包有一些问题,更准确地说是规则学习算法。我自己创建了一个 .arff 文件,您可以在下面看到。现在我有 运行 RWeka 包的 JRip 和 J48 算法以及 .arff 文件的数据并得到以下规则:
> JRip(Failure ~., data=date)
JRIP rules:
===========
=> Failure=no (35.0/11.0)
Number of Rules : 1
> J48(Failure ~., data=date)
J48 pruned tree
------------------
: no (35.0/11.0)
Number of Leaves : 1
Size of the tree : 1
所以现在我的问题是为什么算法找不到基于生产日期的规则?因为很明显,2013-04-01生产的所有产品都是有问题的。
我这里的错误是什么?
提前致谢!
titus24
@RELATION dataset
@ATTRIBUTE Date-of-Production DATE "yyyy-MM-dd HH:mm:ss"
@ATTRIBUTE Location {Frankfurt, Cologne, Hamburg, Munich, Berlin}
@ATTRIBUTE Failure {yes, no}
@DATA
"2013-04-01 00:00:00",Frankfurt,yes
"2013-04-01 00:00:00",Cologne,yes
"2013-04-01 00:00:00",Munich,yes
"2013-04-01 00:00:00",Hamburg,yes
"2013-04-01 00:00:00",Berlin,yes
"2013-04-01 00:00:00",Frankfurt,yes
"2013-04-01 00:00:00",Cologne,yes
"2013-04-01 00:00:00",Munich,yes
"2013-04-01 00:00:00",Hamburg,yes
"2013-04-01 00:00:00",Berlin,yes
"2013-04-01 00:00:00",Frankfurt,yes
"2012-05-01 00:00:00",Cologne,no
"2012-05-02 00:00:00",Munich,no
"2012-05-03 00:00:00",Hamburg,no
"2012-05-04 00:00:00",Berlin,no
"2012-05-05 00:00:00",Frankfurt,no
"2012-05-06 00:00:00",Cologne,no
"2012-05-07 00:00:00",Munich,no
"2012-05-08 00:00:00",Hamburg,no
"2012-05-09 00:00:00",Berlin,no
"2012-05-10 00:00:00",Frankfurt,no
"2012-05-11 00:00:00",Cologne,no
"2012-05-12 00:00:00",Munich,no
"2012-05-13 00:00:00",Hamburg,no
"2012-05-14 00:00:00",Berlin,no
"2012-05-15 00:00:00",Frankfurt,no
"2012-05-16 00:00:00",Cologne,no
"2012-05-17 00:00:00",Munich,no
"2012-05-18 00:00:00",Hamburg,no
"2012-05-19 00:00:00",Berlin,no
"2012-05-20 00:00:00",Frankfurt,no
"2012-05-21 00:00:00",Cologne,no
"2012-05-22 00:00:00",Munich,no
"2012-05-23 00:00:00",Hamburg,no
"2012-05-24 00:00:00",Berlin,no
说明
WEKA 中属性日期的内部表示是浮点数,存储自格林威治标准时间 00:00:00 1970 年 1 月 1 日以来的毫秒数。如 weka.core.Attribute 文档中所述。在 RWeka 中从 POSIXct/POSIXt 到浮点数的转换存在某种问题。
解决方案
手动转换日期和运行分类:
dataset <- read.arff("date.arff")
dataset[,1] <- unclass(dataset[, 1]) # get internal representation
J48(Failure ~ ., data = dataset)
输出与 WEKA Explorer 3.7.12 中的相同:
Date-of-Production <= 1337810400: no (24.0)
Date-of-Production > 1337810400: yes (11.0)
Number of Leaves : 2
Size of the tree : 3
我对 R 的 RWeka 包有一些问题,更准确地说是规则学习算法。我自己创建了一个 .arff 文件,您可以在下面看到。现在我有 运行 RWeka 包的 JRip 和 J48 算法以及 .arff 文件的数据并得到以下规则:
> JRip(Failure ~., data=date)
JRIP rules:
===========
=> Failure=no (35.0/11.0)
Number of Rules : 1
> J48(Failure ~., data=date)
J48 pruned tree
------------------
: no (35.0/11.0)
Number of Leaves : 1
Size of the tree : 1
所以现在我的问题是为什么算法找不到基于生产日期的规则?因为很明显,2013-04-01生产的所有产品都是有问题的。
我这里的错误是什么?
提前致谢! titus24
@RELATION dataset
@ATTRIBUTE Date-of-Production DATE "yyyy-MM-dd HH:mm:ss"
@ATTRIBUTE Location {Frankfurt, Cologne, Hamburg, Munich, Berlin}
@ATTRIBUTE Failure {yes, no}
@DATA
"2013-04-01 00:00:00",Frankfurt,yes
"2013-04-01 00:00:00",Cologne,yes
"2013-04-01 00:00:00",Munich,yes
"2013-04-01 00:00:00",Hamburg,yes
"2013-04-01 00:00:00",Berlin,yes
"2013-04-01 00:00:00",Frankfurt,yes
"2013-04-01 00:00:00",Cologne,yes
"2013-04-01 00:00:00",Munich,yes
"2013-04-01 00:00:00",Hamburg,yes
"2013-04-01 00:00:00",Berlin,yes
"2013-04-01 00:00:00",Frankfurt,yes
"2012-05-01 00:00:00",Cologne,no
"2012-05-02 00:00:00",Munich,no
"2012-05-03 00:00:00",Hamburg,no
"2012-05-04 00:00:00",Berlin,no
"2012-05-05 00:00:00",Frankfurt,no
"2012-05-06 00:00:00",Cologne,no
"2012-05-07 00:00:00",Munich,no
"2012-05-08 00:00:00",Hamburg,no
"2012-05-09 00:00:00",Berlin,no
"2012-05-10 00:00:00",Frankfurt,no
"2012-05-11 00:00:00",Cologne,no
"2012-05-12 00:00:00",Munich,no
"2012-05-13 00:00:00",Hamburg,no
"2012-05-14 00:00:00",Berlin,no
"2012-05-15 00:00:00",Frankfurt,no
"2012-05-16 00:00:00",Cologne,no
"2012-05-17 00:00:00",Munich,no
"2012-05-18 00:00:00",Hamburg,no
"2012-05-19 00:00:00",Berlin,no
"2012-05-20 00:00:00",Frankfurt,no
"2012-05-21 00:00:00",Cologne,no
"2012-05-22 00:00:00",Munich,no
"2012-05-23 00:00:00",Hamburg,no
"2012-05-24 00:00:00",Berlin,no
说明
WEKA 中属性日期的内部表示是浮点数,存储自格林威治标准时间 00:00:00 1970 年 1 月 1 日以来的毫秒数。如 weka.core.Attribute 文档中所述。在 RWeka 中从 POSIXct/POSIXt 到浮点数的转换存在某种问题。
解决方案
手动转换日期和运行分类:
dataset <- read.arff("date.arff")
dataset[,1] <- unclass(dataset[, 1]) # get internal representation
J48(Failure ~ ., data = dataset)
输出与 WEKA Explorer 3.7.12 中的相同:
Date-of-Production <= 1337810400: no (24.0)
Date-of-Production > 1337810400: yes (11.0)
Number of Leaves : 2
Size of the tree : 3