R 中的先验误差

Question

我正在研究使用先验算法来创建 groupings/market 篮子我的物品。以下是将此数据集转换为交易 class 类型后的摘要。

我认为我的错误与先验函数中选择的参数有关。任何见解都会很棒。

summary(groceries)

transactions as itemMatrix in sparse format with
 57 rows (elements/itemsets/transactions) and
 817 columns (items) and a density of 0.03135133 

most frequent items:
                  A                   B                   C                   D             (Other) 
                 13                  13                  13                  12                  12                1397 

element (itemset/transaction) length distribution:
sizes
  3   4   5   6   7   8   9  10  13  14  16  17  18  22  29  30  32  33  34  40  43  45  55  77  86 111 118 353 
  7   4   4   4   3   4   4   3   1   3   2   1   1   1   1   2   1   1   1   1   1   1   1   1   1   1   1   1 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   3.00    5.00    9.00   25.61   29.00  353.00 

includes extended item information - examples:
  labels
1      E
2      F
3      G


groceryrules<-apriori(groceries, 
                      parameter = list(support = 0.15, 
                      confidence = 0.05, 
                      minlen = 2))

当运行时，它工作得很好，但是当我尝试降低支持时，因为没有太多建议出现，它不起作用。

我试过：

groceryrules<-apriori(groceries, 
                      parameter = list(support = 0.14, 
                      confidence = 0.05, 
                      minlen = 2))

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen maxlen target   ext
       0.05    0.1    1 none FALSE            TRUE       5    0.14      2     10  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 7 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[817 item(s), 57 transaction(s)] done [0.00s].
sorting and recoding items ... [14 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [90 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Warning message:
In apriori(groceries, parameter = list(support = 0.14, confidence = 0.05,  :
  Mining stopped (maxlen reached). Only patterns up to a length of 5 returned!

为什么改变支撑但这么少的量会导致错误？

Answer 1

  Mining stopped (maxlen reached). Only patterns up to a length of 5 returned!

minlen 和 maxlen 是罪魁祸首。您在参数列表中声明了 minlen =2。您没有指定 maxlen 因此算法采用默认值 10（在算法输出中查看）然而，maxtime（你没有指定并且也使用了默认值 5 秒）意味着如果计算长度为 n 的规则，计算将花费超过 5 秒 - 然后算法停止并像你一样发出警告得到 - 说明 - 我只在最大时间规则被违反之前到达 maxlen=5。

checking subsets of size 1 2 3 4 5 done [0.00s].

检查大小为 6 的子集 - 将花费太长时间，因此被跳过......

so - 更改 maxtime（添加到与 minlen 相同的参数列表：maxtime=10 或 maxtime=20 等）或在大多数情况下 - 忽略警告。这不是错误。查找超过 5 个项目的规则对您来说真的很重要吗？我想不是。你没有指定这只是一个默认值

R 中的先验误差

Apriori Error in R

r

apriori

arules