使用数据中的 Arules 和 ArulesViz 的关联规则

Question

我有一个 customer_id 和 product_name 的 R 数据框。一个客户可以有多个产品。在客户列中有重复的 customer_id，因为他们有多种产品。

我正在尝试进行基本的先验分析并确定一起购买的产品的一些关联规则。我想使用 R 中的 Arules 和 ArulesViz 包来执行此操作。

当我尝试运行时，我通常会得到 0 条规则或 lhs 产品 --> rhs customer_id。所以我认为我没有正确加载数据来查看单个客户的多个产品以得出关联。

如有任何帮助，我们将不胜感激！

基本数据框示例

df <- data.frame( cust_id = as.factor(c('1aa2j', '1aa2j', '2b345',
'2b345', 'g78a8', 'y67r3')), product = as.factor(c("Bat", "Sock",
"Hat", "Shirt", "Ball", "Shorts")))

rules <- apriori(df) inspect(rules)

lhs rhs support confidence lift 1 {product=Bat} => {cust_id=1aa2j} 0.167 1 3
2 {product=Sock} => {cust_id=1aa2j} 0.167 1 3
3 {product=Hat} => {cust_id=2b345} 0.167 1 3
4 {product=Shirt} => {cust_id=2b345} 0.167 1 3
5 {cust_id=g78a8} => {product=Ball} 0.167 1 6
6 {product=Ball} => {cust_id=g78a8} 0.167 1 6
7 {cust_id=y67r3} => {product=Shorts} 0.167 1 6
8 {product=Shorts} => {cust_id=y67r3} 0.167 1 6

Answer 1

这取自 transactions 的示例（略有修改）：

library(arules)
df <- data.frame( cust_id = as.factor(c('1aa2j', '1aa2j', '2b345',
'2b345', 'g78a8', 'y67r3')), product = as.factor(c("Bat", "Sock",
"Hat", "Shirt", "Ball", "Shorts")))

trans <- as(split(df[,"product"], df[,"cust_id"]), "transactions")
inspect(trans)

    items       transactionID
[1] {Bat,Sock}  1aa2j        
[2] {Hat,Shirt} 2b345        
[3] {Ball}      g78a8        
[4] {Shorts}    y67r3

现在您可以在 trans 上使用 apriori。

使用数据中的 Arules 和 ArulesViz 的关联规则

Association Rules using Arules and ArulesViz from Data

r

associations

apriori

market-basket-analysis