R如何将项目集的类别合并到交易数据中
R how to incorporate categories of item set in transactions data
在 R 中,我想使用以下数据框创建交易数据,这样我就可以 运行 apriori
包 arules
中。它具有交易 ID、项目 ID 和类别 ID,以及项目的父项。
Transaction_ID Item_ID Category_ID
T01 A001 A01
T01 A002 A01
T02 A001 A01
T02 A003 A02
T02 A002 A01
T03 A005 A03
T05 A004 A03
T05 A002 A01
T05 A005 A03
T04 A001 A01
T04 A003 A02
我想将类别 ID 作为标签(项目)之上的级别合并到交易数据中作为 Groceries
数据。
str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
.. .. ..@ p : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
.. .. ..@ Dim : int [1:2] 169 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 169 obs. of 3 variables:
.. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ...
.. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44 44 44 44 44 44 42 42 41 ...
.. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6 6 6 6 6 ...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
但是,read.transactions
仅允许您使用参数 cols 导入交易 ID 和商品 ID。我也试过这个
transaction_by_item<-split(df[,c("Item_ID","Category_ID")],df$Transaction_ID)
basket <- as(transaction_by_item, "transactions")
它给出了一个错误
Error in asMethod(object) : can coerce list with atomic components only
如果我只是尝试拆分仅包含商品 ID 的交易,它会起作用。 transaction_by_item<-split(df$Item_ID,df$Transaction_ID)
有谁知道在创建交易数据时如何合并项目 ID(标签)和类别 ID(级别)?谢谢。
也许这个可以帮到你,首先让我们介绍一下arules
函数itemInfo()
:
library(arules)
itemInfo(Groceries)
head(itemInfo(Groceries))
labels level2 level1
1 frankfurter sausage meat and sausage
2 sausage sausage meat and sausage
3 liver loaf sausage meat and sausage
4 ham sausage meat and sausage
5 meat sausage meat and sausage
6 finished products sausage meat and sausage
现在,正如您所说,Groceries
有几个级别,在其他人手中:
trans4 <- as(split(dats[,"Item_ID"], dats[,"Transaction_ID"]), "transactions")
str(trans4)
itemInfo(trans4)
labels
1 A001
2 A002
3 A003
4 A004
5 A005
现在,您必须将其添加到您的数据中,因此您可以这样做:
library(dplyr)
labels_ <- dats %>% select(Item_ID, Category_ID) %>% distinct()
itemInfo(trans4) <- data.frame(labels = labels_$Item_ID, level1 =labels_$Category_ID)
现在:
itemInfo(trans4)
labels level1
1 A001 A01
2 A002 A01
3 A003 A02
4 A005 A03
5 A004 A03
并且:
str(trans4)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:11] 0 1 0 1 2 4 0 2 1 3 ...
.. .. ..@ p : int [1:6] 0 2 5 6 8 11
.. .. ..@ Dim : int [1:2] 5 5
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 5 obs. of 2 variables:
.. ..$ labels: Factor w/ 5 levels "A001","A002",..: 1 2 3 5 4
.. ..$ level1: Factor w/ 3 levels "A01","A02","A03": 1 1 2 3 3 # here we go!!!
..@ itemsetInfo:'data.frame': 5 obs. of 1 variable:
.. ..$ transactionID: chr [1:5] "T01" "T02" "T03" "T04" ...
有数据:
dats <- structure(list(Transaction_ID = structure(c(1L, 1L, 2L, 2L, 2L,
3L, 5L, 5L, 5L, 4L, 4L), .Label = c("T01", "T02", "T03", "T04",
"T05"), class = "factor"), Item_ID = structure(c(1L, 2L, 1L,
3L, 2L, 5L, 4L, 2L, 5L, 1L, 3L), .Label = c("A001", "A002", "A003",
"A004", "A005"), class = "factor"), Category_ID = structure(c(1L,
1L, 1L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 2L), .Label = c("A01", "A02",
"A03"), class = "factor")), class = "data.frame", row.names = c(NA,
-11L))
在 R 中,我想使用以下数据框创建交易数据,这样我就可以 运行 apriori
包 arules
中。它具有交易 ID、项目 ID 和类别 ID,以及项目的父项。
Transaction_ID Item_ID Category_ID
T01 A001 A01
T01 A002 A01
T02 A001 A01
T02 A003 A02
T02 A002 A01
T03 A005 A03
T05 A004 A03
T05 A002 A01
T05 A005 A03
T04 A001 A01
T04 A003 A02
我想将类别 ID 作为标签(项目)之上的级别合并到交易数据中作为 Groceries
数据。
str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
.. .. ..@ p : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
.. .. ..@ Dim : int [1:2] 169 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 169 obs. of 3 variables:
.. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ...
.. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44 44 44 44 44 44 42 42 41 ...
.. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6 6 6 6 6 ...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
但是,read.transactions
仅允许您使用参数 cols 导入交易 ID 和商品 ID。我也试过这个
transaction_by_item<-split(df[,c("Item_ID","Category_ID")],df$Transaction_ID)
basket <- as(transaction_by_item, "transactions")
它给出了一个错误
Error in asMethod(object) : can coerce list with atomic components only
如果我只是尝试拆分仅包含商品 ID 的交易,它会起作用。 transaction_by_item<-split(df$Item_ID,df$Transaction_ID)
有谁知道在创建交易数据时如何合并项目 ID(标签)和类别 ID(级别)?谢谢。
也许这个可以帮到你,首先让我们介绍一下arules
函数itemInfo()
:
library(arules)
itemInfo(Groceries)
head(itemInfo(Groceries))
labels level2 level1
1 frankfurter sausage meat and sausage
2 sausage sausage meat and sausage
3 liver loaf sausage meat and sausage
4 ham sausage meat and sausage
5 meat sausage meat and sausage
6 finished products sausage meat and sausage
现在,正如您所说,Groceries
有几个级别,在其他人手中:
trans4 <- as(split(dats[,"Item_ID"], dats[,"Transaction_ID"]), "transactions")
str(trans4)
itemInfo(trans4)
labels
1 A001
2 A002
3 A003
4 A004
5 A005
现在,您必须将其添加到您的数据中,因此您可以这样做:
library(dplyr)
labels_ <- dats %>% select(Item_ID, Category_ID) %>% distinct()
itemInfo(trans4) <- data.frame(labels = labels_$Item_ID, level1 =labels_$Category_ID)
现在:
itemInfo(trans4)
labels level1
1 A001 A01
2 A002 A01
3 A003 A02
4 A005 A03
5 A004 A03
并且:
str(trans4)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:11] 0 1 0 1 2 4 0 2 1 3 ...
.. .. ..@ p : int [1:6] 0 2 5 6 8 11
.. .. ..@ Dim : int [1:2] 5 5
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 5 obs. of 2 variables:
.. ..$ labels: Factor w/ 5 levels "A001","A002",..: 1 2 3 5 4
.. ..$ level1: Factor w/ 3 levels "A01","A02","A03": 1 1 2 3 3 # here we go!!!
..@ itemsetInfo:'data.frame': 5 obs. of 1 variable:
.. ..$ transactionID: chr [1:5] "T01" "T02" "T03" "T04" ...
有数据:
dats <- structure(list(Transaction_ID = structure(c(1L, 1L, 2L, 2L, 2L,
3L, 5L, 5L, 5L, 4L, 4L), .Label = c("T01", "T02", "T03", "T04",
"T05"), class = "factor"), Item_ID = structure(c(1L, 2L, 1L,
3L, 2L, 5L, 4L, 2L, 5L, 1L, 3L), .Label = c("A001", "A002", "A003",
"A004", "A005"), class = "factor"), Category_ID = structure(c(1L,
1L, 1L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 2L), .Label = c("A01", "A02",
"A03"), class = "factor")), class = "data.frame", row.names = c(NA,
-11L))