我们如何才能找到对规则的先验支持和信心?

How can we find support and confident in apriori for rules?

我正在交易数据中进行项目关联。我在 R 中使用 arules 包来构建规则。 我正在与此 link https://1drv.ms/u/s!Ak1rt2E1f2gFgV9t7hMVAn0P4gd0

共享我的示例数据
library(arules)
library(arulesViz)
df = read.csv("trans.csv")
trans = as(split(df[,"Item"], df[,"Billno"]), "transactions")
inspect(trans[1:20])
summary(trans)
rules1 = apriori(trans,parameter = list(support = 0.6, confidence = 0.6, 
target = "rules"))
summary(rules1) ##Output is "Set of 0 rules"

我得到的输出为,

Summary(rules1)

set of 0 rules

我在发布之前提到了 https://stats.stackexchange.com/questions/56034/association-analysis-returns-0-useful-rules 这个 link。我也尝试了随机数来获得支持和信心,但没有任何效果。

找到正确的最小支持度和最小置信度值并以 0 个频繁项集或 0 个关联规则结束的问题非常普遍。如果您需要复习支持和信心的确切含义,请阅读 this

我们先来看看您的交易数据:

summary(trans)
transactions as itemMatrix in sparse format with
 2531 rows (elements/itemsets/transactions) and
 6632 columns (items) and a density of 0.0005951533 

most frequent items:
AR845311 AR800369 AR828249 AR839869 AR831167  (Other) 
      84       35       31       29       24     9787 

element (itemset/transaction) length distribution:
sizes
   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21 
 767 509 306 238 160 112 100  52  69  50  31  27  18  12  13  15   9  10   7   5   4 
 23  24  25  27  28  32  34  36  48 
  3   4   2   3   1   1   1   1   1 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   2.000   3.947   5.000  48.000 

首先要解决的问题是最低支持率。摘要表明您最常出现的项目 (AR845311) 在数据集中出现了 84 次。您的商品总体上获得的支持度很低

summary(itemFrequency(trans))

      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
      0.0003951 0.0003951 0.0003951 0.0005952 0.0003951 0.0331900 

你用了一分钟。 0.6的支持度,但是出现频率最高的单项只有0.033的支持度!你需要减少你的支持。如果您想找到在您的数据中至少出现 10 次的 itemsets/rules,那么您可以将最小支持度设置为:

 10/length(trans)

 [1] 0.003951008

第二个问题是您的数据非常稀疏(摘要显示密度约为 0.0006)。这意味着您的交易时间很短(即只包含很少的项目)。

table(size(trans))

  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21 
767 509 306 238 160 112 100  52  69  50  31  27  18  12  13  15   9  10   7   5   4 
 23  24  25  27  28  32  34  36  48 
  3   4   2   3   1   1   1   1   1 

短期交易意味着对规则的信心可能会很低。对于你的数据结果是很低,所以我先用0。

rules <- apriori(trans, 
+   parameter = list(support = 0.004, confidence = 0, target = "rules"))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen maxlen
          0    0.1    1 none FALSE            TRUE       5   0.004      1     10
 target   ext
  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 10 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[6632 item(s), 2531 transaction(s)] done [0.00s].
sorting and recoding items ... [40 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 done [0.00s].
writing ... [46 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
> summary(rules)
set of 46 rules

rule length distribution (lhs + rhs):sizes
 1  2 
40  6 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    1.00    1.00    1.13    1.00    2.00 

summary of quality measures:
    support           confidence            lift            count      
 Min.   :0.004346   Min.   :0.004346   Min.   : 1.000   Min.   :11.00  
 1st Qu.:0.004741   1st Qu.:0.004840   1st Qu.: 1.000   1st Qu.:12.00  
 Median :0.005531   Median :0.005729   Median : 1.000   Median :14.00  
 Mean   :0.006803   Mean   :0.057301   Mean   : 3.316   Mean   :17.22  
 3rd Qu.:0.007112   3rd Qu.:0.008890   3rd Qu.: 1.000   3rd Qu.:18.00  
 Max.   :0.033188   Max.   :0.705882   Max.   :21.269   Max.   :84.00  

mining info:
  data ntransactions support confidence
 trans          2531   0.004          0

结果表明至少有一条置信度为0.7的规则。您可以更有信心地再次 运行 APRIORI。以下是置信度最高的规则:

inspect(head(rules, by = "confidence"))
    lhs           rhs        support     confidence lift     count
[1] {AR835501} => {AR845311} 0.004741209 0.7058824  21.26891 12   
[2] {AR743988} => {AR845311} 0.004346108 0.6470588  19.49650 11   
[3] {AR800369} => {AR845311} 0.007111814 0.5142857  15.49592 18   
[4] {AR845311} => {AR800369} 0.007111814 0.2142857  15.49592 18   
[5] {AR845311} => {AR835501} 0.004741209 0.1428571  21.26891 12   
[6] {AR845311} => {AR743988} 0.004346108 0.1309524  19.49650 11 

可以找到有关如何使用关联规则挖掘的完整示例 here

希望对您有所帮助!