R - 先验函数错误
R - Apriori function error
我目前在使用先验函数时遇到问题。问题是我有一个包含如下数据的 csv:
Desc,Cantidad,Valor,Fecha,Lugar,UUID
DESCUENTO,1,-3405,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
DESCUENTO,1,-3405,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
DESCUENTO,1,-170,2014-09-05T15:10:24,83000,7F0C7F0B-BCFC-4FCA-8740-B36AE9932869
Descuento de TYK Dia,1,-156,2014-06-19T16:52:27,86280,1E08E51E-213A-4EE0-8FE9-492E677FF0C9
Descuento de TYK Dia,1,-139,2014-04-25T10:52:44,86280,AB802E63-2D0D-4B47-AB70-DDE007929F9F
DESCUENTO,1,-63,2014-07-04T13:53:10,83000,5B1F12BB-71DE-4734-A774-8D377757A880
REDONDEO,1,-1,2014-03-29T10:50:59,0,5B241EFA-6654-46EA-B47A-3CB76C5EA923
DESCUENTO,1,-1,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
DESCUENTO,1,-1,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
LAVADO,1,0,2014-05-27T18:18:11,44500,e5d540d6-0f98-4993-ec09-56887cd4a27d
TUA,1,0,2014-09-29T10:20:31,6500,1d8ada06-a8a1-4bd8-9356-851b5da28108
Transportación Aerea,1,0,2014-10-03T10:41:09,6500,5fc3925a-d08a-4cdc-be7e-ca02bd488d5b
OBSEQUIO LAVADO DE CARROCERIA,1,0,2014-04-07T13:45:55,91800,8148ab07-5804-4b2b-b37c-5323b394907a
Arroz Al Azafran Combos A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585
Frijoles Charros A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585
Pepsi Ch A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585
FECHA DE CONSUMO 18/07/2014,1,0,2014-07-19T18:01:45,6060,0f0465aa-a75b-4f95-8e3b-43c13452cafb
CAMBIO DE ACEITE DE MOTOR,1,0,2014-02-01T11:18:53,39890,5BDF0742-CDF5-4F6B-9937-DF1CB00274ED
CAMBIO DE FILTRO DE ACEITE,1,0,2014-02-01T11:18:53,39890,5BDF0742-CDF5-4F6B-9937-DF1CB00274ED
整个 CSV (https://github.com/antonio1695/BaseX/blob/master/facturas1.csv)
要下载文件,只需单击查找文件,然后您就会看到该文件。
所以我所做的是:
> df1 <- read.csv("facturas1.csv")
> rules <- apriori(df1,parameter=list(support=0.01,confidence=0.5))
Error in asMethod(object) :
column(s) 3 not logical or a factor. Discretize the columns first.
然而,问题是列已经是离散的,如果我更改数据以使其第 3 列代替第 2 列,反之亦然。它仍然说第 3 列不合逻辑,或者当它应该说第 2 列时它不是一个因素。谢谢!
library(arules)
df1 <- read.csv("https://raw.githubusercontent.com/antonio1695/BaseX/master/facturas1.csv")
trans <- as(df1, "transactions")
Error in asMethod(object) :
column(s) 3 not logical or a factor. Discretize the columns first.
我们来看一下数据框:
str(df1)
'data.frame': 10510 obs. of 6 variables:
$ Desc : Factor w/ 3927 levels "0","00000215R0 - LIQUIDO DE FRENOS",..: 1490 1490 1490 1491 1491 1490 3209 1490 1490 2238 ...
$ Cantidad: Factor w/ 85 levels "","1","-1","10",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Valor : int -3405 -3405 -170 -156 -139 -63 -1 -1 -1 0 ...
$ Fecha : Factor w/ 4054 levels "1294","2014-01-06T11:10:21",..: 4041 4041 3443 1794 596 2125 241 4041 4041 1215 ...
$ Lugar : Factor w/ 982 levels "","0","1000",..: 487 487 802 848 848 802 2 487 487 373 ...
$ UUID : Factor w/ 4056 levels "0019A60D-78F8-E341-8D3E-9786201FE017",..: 1988 1988 1979 456 2711 1423 1424 1988 1988 3658 ...
Valor是一个数字(int),需要离散化!例如 discretize():
df1$Valor <- discretize(df1$Valor)
head(df1$Valor)
[1] [-3405, 2400) [-3405, 2400) [-3405, 2400) [-3405, 2400) [-3405, 2400)
[6] [-3405, 2400)
Levels: [-3405, 2400) [ 2400, 8204) [ 8204,14009]
现在您可以创建交易并应用 APRIORI:
trans <- as(df1, "transactions")
rules <- apriori(trans,parameter=list(support=0.01,confidence=0.5))
rules
set of 84 rules
经过一番研究,我发现 apriori 函数必须采用间隔才能正常工作,因此当您使用 discretize您必须将参数 "categories" 添加到 select 您想要的间隔数。它不可能不采取间隔。我将 post 代码放在这里:
我决定采用 20 个间隔,这完全取决于间隔中的值重复的频率。
df$Valor <- discretize(df$Valor, method="frequency",categories = 20)
希望对大家有所帮助。
我目前在使用先验函数时遇到问题。问题是我有一个包含如下数据的 csv:
Desc,Cantidad,Valor,Fecha,Lugar,UUID
DESCUENTO,1,-3405,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
DESCUENTO,1,-3405,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
DESCUENTO,1,-170,2014-09-05T15:10:24,83000,7F0C7F0B-BCFC-4FCA-8740-B36AE9932869
Descuento de TYK Dia,1,-156,2014-06-19T16:52:27,86280,1E08E51E-213A-4EE0-8FE9-492E677FF0C9
Descuento de TYK Dia,1,-139,2014-04-25T10:52:44,86280,AB802E63-2D0D-4B47-AB70-DDE007929F9F
DESCUENTO,1,-63,2014-07-04T13:53:10,83000,5B1F12BB-71DE-4734-A774-8D377757A880
REDONDEO,1,-1,2014-03-29T10:50:59,0,5B241EFA-6654-46EA-B47A-3CB76C5EA923
DESCUENTO,1,-1,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
DESCUENTO,1,-1,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
LAVADO,1,0,2014-05-27T18:18:11,44500,e5d540d6-0f98-4993-ec09-56887cd4a27d
TUA,1,0,2014-09-29T10:20:31,6500,1d8ada06-a8a1-4bd8-9356-851b5da28108
Transportación Aerea,1,0,2014-10-03T10:41:09,6500,5fc3925a-d08a-4cdc-be7e-ca02bd488d5b
OBSEQUIO LAVADO DE CARROCERIA,1,0,2014-04-07T13:45:55,91800,8148ab07-5804-4b2b-b37c-5323b394907a
Arroz Al Azafran Combos A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585
Frijoles Charros A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585
Pepsi Ch A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585
FECHA DE CONSUMO 18/07/2014,1,0,2014-07-19T18:01:45,6060,0f0465aa-a75b-4f95-8e3b-43c13452cafb
CAMBIO DE ACEITE DE MOTOR,1,0,2014-02-01T11:18:53,39890,5BDF0742-CDF5-4F6B-9937-DF1CB00274ED
CAMBIO DE FILTRO DE ACEITE,1,0,2014-02-01T11:18:53,39890,5BDF0742-CDF5-4F6B-9937-DF1CB00274ED
整个 CSV (https://github.com/antonio1695/BaseX/blob/master/facturas1.csv) 要下载文件,只需单击查找文件,然后您就会看到该文件。 所以我所做的是:
> df1 <- read.csv("facturas1.csv")
> rules <- apriori(df1,parameter=list(support=0.01,confidence=0.5))
Error in asMethod(object) :
column(s) 3 not logical or a factor. Discretize the columns first.
然而,问题是列已经是离散的,如果我更改数据以使其第 3 列代替第 2 列,反之亦然。它仍然说第 3 列不合逻辑,或者当它应该说第 2 列时它不是一个因素。谢谢!
library(arules)
df1 <- read.csv("https://raw.githubusercontent.com/antonio1695/BaseX/master/facturas1.csv")
trans <- as(df1, "transactions")
Error in asMethod(object) :
column(s) 3 not logical or a factor. Discretize the columns first.
我们来看一下数据框:
str(df1)
'data.frame': 10510 obs. of 6 variables:
$ Desc : Factor w/ 3927 levels "0","00000215R0 - LIQUIDO DE FRENOS",..: 1490 1490 1490 1491 1491 1490 3209 1490 1490 2238 ...
$ Cantidad: Factor w/ 85 levels "","1","-1","10",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Valor : int -3405 -3405 -170 -156 -139 -63 -1 -1 -1 0 ...
$ Fecha : Factor w/ 4054 levels "1294","2014-01-06T11:10:21",..: 4041 4041 3443 1794 596 2125 241 4041 4041 1215 ...
$ Lugar : Factor w/ 982 levels "","0","1000",..: 487 487 802 848 848 802 2 487 487 373 ...
$ UUID : Factor w/ 4056 levels "0019A60D-78F8-E341-8D3E-9786201FE017",..: 1988 1988 1979 456 2711 1423 1424 1988 1988 3658 ...
Valor是一个数字(int),需要离散化!例如 discretize():
df1$Valor <- discretize(df1$Valor)
head(df1$Valor)
[1] [-3405, 2400) [-3405, 2400) [-3405, 2400) [-3405, 2400) [-3405, 2400)
[6] [-3405, 2400)
Levels: [-3405, 2400) [ 2400, 8204) [ 8204,14009]
现在您可以创建交易并应用 APRIORI:
trans <- as(df1, "transactions")
rules <- apriori(trans,parameter=list(support=0.01,confidence=0.5))
rules
set of 84 rules
经过一番研究,我发现 apriori 函数必须采用间隔才能正常工作,因此当您使用 discretize您必须将参数 "categories" 添加到 select 您想要的间隔数。它不可能不采取间隔。我将 post 代码放在这里:
我决定采用 20 个间隔,这完全取决于间隔中的值重复的频率。
df$Valor <- discretize(df$Valor, method="frequency",categories = 20)
希望对大家有所帮助。