确保两个交易矩阵具有相同的结构

Making sure two transaction matrices have the same structure

我想根据新交易列表计算(先前定义的/旧的)项目集的提升。这可以通过 interestMeasure 函数来完成。

quality(old_itemsets)$lift_ref <- interestMeasure(old_itemsets,"lift",transactions = TransMat_ref, reuse = FALSE)

问题是:这不能正常工作。我知道这一点,因为我有一些只包含一个项目的项目集。在计算新交易中的提升时,对于这些单项,提升应该等于 1,但事实并非如此!

我认为问题可能出在我的预处理中。我用于生成项目集的交易和新交易不包含完全相同的项目。因此,我将一个列表中缺少的项目添加到另一个列表中,反之亦然。这是一个如何在一个方向上完成的示例。

OldNames <- colnames(TransMat_old)
ReferenceNames <- colnames(TransMat_ref)

SetDiffNames <- setdiff(ReferenceNames, OldNames)

ItemsToAdd <- matrix(data = FALSE, nrow = length(TransMat_old), ncol = length(SetDiffNames))
colnames(ItemsToAdd) <- SetDiffNames

TransMat_old <- merge(TransMat_old, ItemsToAdd)

正如我上面所写,我这样做了两次,所以两个交易矩阵都包含所有项目。问题是:缺少的项目只是作为附加列添加,这意味着它们对于两个矩阵的顺序不同!

这可能是我在顶部的 interestMeasure 不起作用的原因吗?

提前致谢!

主要编辑:这是我的可重现示例

library(arules)

#create transactions
data <- paste(
"item1, item2, item3",
"item1, item3",
"item1, item2",
sep="\n")
cat(data)
write(data, file = "TransMat_Old")

data <- paste(
"item2, item3, item4",
"item3, item4",
"item2, item4",
"item2",
sep="\n")
cat(data)
write(data, file = "TransMat_New")

# load transactions
TransMat_Old <- read.transactions("TransMat_Old", format = "basket", sep=",") 
TransMat_New <- read.transactions("TransMat_New", format = "basket", sep=",") 

# Here's my function for adding
SameItems <- function(TransMat_Old, TransMat_New){

    OldNames <- colnames(TransMat_Old)
    NewNames <- colnames(TransMat_New)

    SetDiffNames <- setdiff(NewNames, OldNames)

    ItemsToAdd <- matrix(data = FALSE, nrow = length(TransMat_Old), ncol = length(SetDiffNames))
    colnames(ItemsToAdd) <- SetDiffNames

    TransMat_Data_allItems <- merge(TransMat_Old, ItemsToAdd)

    return(TransMat_Data_allItems)
}

# Add items from one matrix to the other and vice versa
Combined1 <- SameItems(TransMat_Old, TransMat_New)
Combined2 <- SameItems(TransMat_New, TransMat_Old)

# Find itemsets in the old matrix
itemsets <- apriori(data=Combined1, parameter=list(supp=0.1, maxlen=2, target="frequent itemsets"))
inspect(itemsets)

#Calculate Lift for the itemsets
quality(itemsets)$lift_oldSet <- interestMeasure(itemsets,"lift", transactions = Combined1, reuse = FALSE)

#Calculate lift for old itemsets based on the new transaction matrix
quality(itemsets)$lift_newSet <- interestMeasure(itemsets,"lift", transactions = Combined2, reuse = FALSE)

#Single-item-itemsets should have a lift of 1. But they have not.
inspect(itemsets)

如上所述:单项项集在新数据集中的提升应该为 1。但他们没有。

只需获取所有项目标签并重新编码交易集。

all_item_labels <- union(itemLabels(TransMat_New),itemLabels(TransMat_Old)) 

TransMat_Old <- recode(TransMat_Old, itemLabels = all_item_labels)
TransMat_New <- recode(TransMat_New, itemLabels = all_item_labels)

现在两个交易集的商品相同,订单相同,相互兼容。