关联分析中提取特殊规则

Extracting special rules in association analysis

如何提取lhs只有一个特殊项目出现的规则

1  {231050}                      => {231051} 0.06063479 1.0000000  16.492183
2  {231050,231051}               => {275001} 0.05490568 0.9055145   6.576661
3  {231050,275001}               => {231051} 0.05490568 1.0000000  16.492183

我只想提取第一行,其中我只有一个 231050

试试这个(假设规则是使用先验生成的):

df <- as(rules, 'data.frame')
df$rules <- as.character(df$rules)
lhs <- do.call(rbind, strsplit(df$rules, split='=>'))[,1]
lhs.items <- strsplit(lhs, split=',')
indices <- which(lapply(lhs.items, length) == 1)
special.item <- '231050'
special.indices <- which(grepl(special.item, lhs.items[[indices]]))
selected.rules <- df[special.indices,]
selected.rules

  rules     support  confidence          lift
1 {231050}=>{231051} 0.06063479          1 16.49218

arules 有一个 subset 函数(参见 ?arules::subset),您可以使用它来绘制满足您的标准的规则子集 - 例如 lhs 上的特定项目、最低支持等:

library(arules)
data("Adult")
rules <- apriori(Adult, parameter = list(supp = 0.5, conf = 0.9, minlen = 2))
item <- "race=White"
rules.sub <- subset(rules, lhs %in% item & size(lhs)==1)
inspect(rules.sub)
#   lhs             rhs                            support   confidence lift     
# 7 {race=White} => {native-country=United-States} 0.7881127 0.9217231  1.0270761
# 8 {race=White} => {capital-gain=None}            0.7817862 0.9143240  0.9966616
# 9 {race=White} => {capital-loss=None}            0.8136849 0.9516307  0.9982720