使用 R 脚本部署 azure ML experiment 以挖掘关联规则时出错
Error deploying azure ML experiment with R script for mining association rules
我在 Azure 机器学习工作室创建了一个新实验,通过模块 Execute R Script
能够从起始数据集中挖掘关联规则。对于这个实验,我使用了 R 版本 Microsoft R Open 3.2.2
在Azure ML上实验使用的函数,我先在R studio上写测试,没有遇到什么问题。
这是我的实验结构:
这是插入到 Azure ML 模块中的代码的一部分,在 R Studio 上可以正常工作:
# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
library("arules")
library("sqldf")
x <- sqldf('select ID_Ordine, AnnoOrdine, ZonaCommerciale, Modello, SUM(Qta) as Qta
from dataset1 group by ID_Ordine, Modello order by ID_Ordine')
a_list1 <- transform(x, Modello = as.factor(Modello),
ID_Ordine = as.factor(ID_Ordine))
transactions <- as(split(x[,"Modello"], x[,"ID_Ordine"]), "transactions")
rules <- sort(apriori(transactions,
parameter = list(supp = 0.1, conf = 0.1, target = "rules",
maxlen = 5)), by="lift")
gi <- generatingItemsets(rules) #remove inverse duplicated rules
d <- which(duplicated(gi)) #remove inverse duplicated rules
rules <- rules[-d] #remove inverse duplicated rules
#create a dataframe to be used as output
result <- data.frame(label_lhs = labels(lhs(rules)),
label_rhs = labels(rhs(rules)),
count = quality(rules)["count"])
# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("result");
如果我从代码 count = quality(rules)["count"]
中排除这一行(将与计数相关的列导入到输出数据框中的语句),实验将正常运行,但是当我还导入计数列时,执行实验的结果给我以下错误:
有人知道如何修复此错误,或者知道 select Azure ML 识别的 arules 对象的计数列的替代方法?
感谢任何建议
count
列在这个版本的包arules
中不是用函数apriori()
计算的,所以我是这样计算的,用逆公式计算支持:
#create a dataframe to be used as output
result <- data.frame(label_lhs = labels(lhs(rules)),
label_rhs = labels(rhs(rules)),
count = quality(rules)$support*length(transactions))
因为支持度是用下面的公式计算的:
support = (number of transactions with A&B)/(number of total transactions)
我在 Azure 机器学习工作室创建了一个新实验,通过模块 Execute R Script
能够从起始数据集中挖掘关联规则。对于这个实验,我使用了 R 版本 Microsoft R Open 3.2.2
在Azure ML上实验使用的函数,我先在R studio上写测试,没有遇到什么问题。
这是我的实验结构:
这是插入到 Azure ML 模块中的代码的一部分,在 R Studio 上可以正常工作:
# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
library("arules")
library("sqldf")
x <- sqldf('select ID_Ordine, AnnoOrdine, ZonaCommerciale, Modello, SUM(Qta) as Qta
from dataset1 group by ID_Ordine, Modello order by ID_Ordine')
a_list1 <- transform(x, Modello = as.factor(Modello),
ID_Ordine = as.factor(ID_Ordine))
transactions <- as(split(x[,"Modello"], x[,"ID_Ordine"]), "transactions")
rules <- sort(apriori(transactions,
parameter = list(supp = 0.1, conf = 0.1, target = "rules",
maxlen = 5)), by="lift")
gi <- generatingItemsets(rules) #remove inverse duplicated rules
d <- which(duplicated(gi)) #remove inverse duplicated rules
rules <- rules[-d] #remove inverse duplicated rules
#create a dataframe to be used as output
result <- data.frame(label_lhs = labels(lhs(rules)),
label_rhs = labels(rhs(rules)),
count = quality(rules)["count"])
# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("result");
如果我从代码 count = quality(rules)["count"]
中排除这一行(将与计数相关的列导入到输出数据框中的语句),实验将正常运行,但是当我还导入计数列时,执行实验的结果给我以下错误:
有人知道如何修复此错误,或者知道 select Azure ML 识别的 arules 对象的计数列的替代方法?
感谢任何建议
count
列在这个版本的包arules
中不是用函数apriori()
计算的,所以我是这样计算的,用逆公式计算支持:
#create a dataframe to be used as output
result <- data.frame(label_lhs = labels(lhs(rules)),
label_rhs = labels(rhs(rules)),
count = quality(rules)$support*length(transactions))
因为支持度是用下面的公式计算的:
support = (number of transactions with A&B)/(number of total transactions)