同一数据源多切关联规则挖掘

Association rule mining for multiple cuts of the same datasource

目标:为每个部门的每个报告生成前 5 个关联规则(按置信度)的列表。

我现有的语法和测试数据:

# Create fake data; 1= used report, 0 = didn't use report
data <- data.frame(Dept=c('A','A','A','B','B','B'), 
              Rep1=c(1,1,1,1,1,1), 
              Rep2=c(0,0,0,1,1,1), 
              Rep3=c(1,1,1,0,0,0),
              Rep4=c(0,1,0,1,1,0),
              Rep5=c(0,0,0,0,0,0),
              Rep6=c(1,1,0,0,1,0),
              Rep7=c(1,1,1,1,1,0),
              Rep8=c(0,0,0,1,1,0),
              Rep9=c(1,0,0,1,1,0),
              Rep10=c(1,1,0,0,1,1)
              )

# Turn all variables to factors
data<-data.frame(lapply(data, factor))

# Changes 0s to NAs, only interested in rules where the report was used
data[data==0]<-NA

# lapply command to run apriori on the data when split by Dept
rules <- lapply(split(data, list(data$Dept)), function(x) {
  # Turn split data into transactions
  temp <- as(x[ , 2:length(x)], "transactions")
  # Create rules; artificially low parameters for testing
  temp <- apriori(temp, parameter = list(support=0.01, confidence=0.1,  minlen=2, maxlen=2))
  # Order rules by confidence, eventually will select top 5 (I'm able to do that), and change it to a data frame for later use
  temp <- as(sort(temp, by = "confidence")[0:length(temp)], "data.frame")
})  

# Breaks out the results into separate data.frames
list2env(rules,.GlobalEnv)

这导致每个部门大约有 50 条规则。但是,它们是在全局级别的部门。例如,部门 A data.frame 有...

rules                support   confidence  lift
{Rep9=1}=>{Rep6=1}  .3333333   1.00000000  1.5
{Rep4=1}=>{Rep6=1}  .3333333   1.00000000  1.5
...    

理想情况下,我的 data.frame 应该遵循...

部门。 A 只有报告 9 data.frame

rules                support   confidence  lift
{Rep9=1}=>{Rep6=1}  .3333333   1.00000000  1.5
{Rep9=1}=>{Rep10=1}  .3333333   1.00000000  1.5    
...

部门。 A 仅包含报告 4 data.frame

rules                support   confidence  lift
{Rep4=1}=>{Rep6=1}  .3333333   1.00000000  1.5
{Rep4=1}=>{Rep10=1}  .3333333   1.00000000  1.5    
...

您需要查看规则模板以将规则的左侧 (LHS) 限制为特定部门。查看 ? APappearance in arules 中的示例。

将 LHS 限制在一个部门的代码看起来有点像这样:

rules_Rep9 <- apriori(temp, parameter = list(support=0.01, confidence = 0.1), appearance = list(lhs = c("Rep9=1"), default="rhs"))