R如何将项目集的类别合并到交易数据中

R how to incorporate categories of item set in transactions data

在 R 中,我想使用以下数据框创建交易数据,这样我就可以 运行 aprioriarules 中。它具有交易 ID、项目 ID 和类别 ID,以及项目的父项。

Transaction_ID  Item_ID Category_ID
T01 A001    A01
T01 A002    A01
T02 A001    A01
T02 A003    A02
T02 A002    A01
T03 A005    A03
T05 A004    A03
T05 A002    A01
T05 A005    A03
T04 A001    A01
T04 A003    A02

我想将类别 ID 作为标签(项目)之上的级别合并到交易数据中作为 Groceries 数据。

str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
  ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ i       : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
  .. .. ..@ p       : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
  .. .. ..@ Dim     : int [1:2] 169 9835
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ factors : list()
  ..@ itemInfo   :'data.frame': 169 obs. of  3 variables:
  .. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ...
  .. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44 44 44 44 44 44 42 42 41 ...
  .. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6 6 6 6 6 ...
  ..@ itemsetInfo:'data.frame': 0 obs. of  0 variables

但是,read.transactions 仅允许您使用参数 cols 导入交易 ID 和商品 ID。我也试过这个

transaction_by_item<-split(df[,c("Item_ID","Category_ID")],df$Transaction_ID)
basket <- as(transaction_by_item, "transactions")

它给出了一个错误 Error in asMethod(object) : can coerce list with atomic components only

如果我只是尝试拆分仅包含商品 ID 的交易,它会起作用。 transaction_by_item<-split(df$Item_ID,df$Transaction_ID)

有谁知道在创建交易数据时如何合并项目 ID(标签)和类别 ID(级别)?谢谢。

也许这个可以帮到你,首先让我们介绍一下arules函数itemInfo():

library(arules)
itemInfo(Groceries)
head(itemInfo(Groceries))
             labels  level2           level1
1       frankfurter sausage meat and sausage
2           sausage sausage meat and sausage
3        liver loaf sausage meat and sausage
4               ham sausage meat and sausage
5              meat sausage meat and sausage
6 finished products sausage meat and sausage

现在,正如您所说,Groceries 有几个级别,在其他人手中:

trans4 <- as(split(dats[,"Item_ID"], dats[,"Transaction_ID"]), "transactions")
str(trans4)
itemInfo(trans4)
  labels
1   A001
2   A002
3   A003
4   A004
5   A005

现在,您必须将其添加到您的数据中,因此您可以这样做:

library(dplyr)
labels_ <- dats %>% select(Item_ID, Category_ID) %>% distinct()
itemInfo(trans4) <- data.frame(labels = labels_$Item_ID, level1 =labels_$Category_ID)

现在:

itemInfo(trans4)
  labels level1
1   A001    A01
2   A002    A01
3   A003    A02
4   A005    A03
5   A004    A03

并且:

str(trans4)
Formal class 'transactions' [package "arules"] with 3 slots
  ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ i       : int [1:11] 0 1 0 1 2 4 0 2 1 3 ...
  .. .. ..@ p       : int [1:6] 0 2 5 6 8 11
  .. .. ..@ Dim     : int [1:2] 5 5
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ factors : list()
  ..@ itemInfo   :'data.frame': 5 obs. of  2 variables:
  .. ..$ labels: Factor w/ 5 levels "A001","A002",..: 1 2 3 5 4
  .. ..$ level1: Factor w/ 3 levels "A01","A02","A03": 1 1 2 3 3    # here we go!!!
  ..@ itemsetInfo:'data.frame': 5 obs. of  1 variable:
  .. ..$ transactionID: chr [1:5] "T01" "T02" "T03" "T04" ...

有数据:

dats <- structure(list(Transaction_ID = structure(c(1L, 1L, 2L, 2L, 2L, 
3L, 5L, 5L, 5L, 4L, 4L), .Label = c("T01", "T02", "T03", "T04", 
"T05"), class = "factor"), Item_ID = structure(c(1L, 2L, 1L, 
3L, 2L, 5L, 4L, 2L, 5L, 1L, 3L), .Label = c("A001", "A002", "A003", 
"A004", "A005"), class = "factor"), Category_ID = structure(c(1L, 
1L, 1L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 2L), .Label = c("A01", "A02", 
"A03"), class = "factor")), class = "data.frame", row.names = c(NA, 
-11L))