将 R data.frame 列转换为 Arules 事务
Convert R data.frame column to Arules transactions
概览:
我需要将以下 data.frame 列 (t$Tags) 转换为 Arules 交易:
- scala
- ios,按钮,swift3,编译器错误,null
- c#,按引用传递,不安全指针
- spring,maven,spring-mvc,spring-安全,spring-java-config
- android,android-片段,android-片段管理器
- scala,scala 集合
- python-2.7,python-3.x,matplotlib,绘图
由于此数据已经采用篮子格式并且遵循 Arules 文档(https://cran.r-project.org/web/packages/arules/arules.pdf,第 90 页)中的示例 3,我通过执行以下操作转换列:
######################################################################################################
#Option 1 - converting data.frame as described in the documentation (page 90)
######################################################################################################
## example 3: creating transactions from data.frame
a_df <- data.frame(
Tags = as.factor(c("scala",
"ios, button, swift3, compiler-errors, null",
"c#, pass-by-reference, unsafe-pointers",
"spring, maven, spring-mvc, spring-security, spring-java-config",
"android, android-fragments, android-fragmentmanager",
"scala, scala-collections",
"python-2.7, python-3.x, matplotlib, plot"))
)
## coerce
trans3 <- as(a_df, "transactions")
rules <- apriori(trans3, parameter = list(sup = 0.1, conf = 0.5, target="rules",minlen=1))
rules_output <- as(rules,"data.frame")
## Result: 0 rules
######################################################################################################
# Option 2 - reading from a CSV file, which contains exactly the same data
# above without the header and the quotes
######################################################################################################
file = "Test.csv"
trans3 = read.transactions(file = file, sep = ",", format = c("basket"))
rules <- apriori(trans3, parameter = list(sup = 0.1, conf = 0.5, target="rules",minlen=1))
rules_output <- as(rules,"data.frame")
## Result: 198 rules
选项 1 - 结果 = 0 规则
选项 2 - 结果 = 198 条规则
问题:
在我当前的任务和环境中,我无法将 data.frame 列保存到格式化的平面文件(CSV 或任何其他文件),然后使用 read.transactions(将选项1翻译成选项2)。
如何以正确的格式转换 data.frame 列以便正确使用 Arules 先验算法?
看看? transactions
中的例子。您需要一个包含项目向量(项目标签)的列表,而不是 data.frame
.
items <- strsplit(as.character(a_df$Tags), ", ")
trans3 <- as(items, "transactions")
rules <- apriori(trans3, parameter = list(sup = 0.1, conf = 0.5, target="rules",minlen=1))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen maxlen
0.5 0.1 1 none FALSE TRUE 5 0.1 1 10
target ext
rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 0
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[22 item(s), 7 transaction(s)] done [0.00s].
sorting and recoding items ... [22 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [198 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
概览:
我需要将以下 data.frame 列 (t$Tags) 转换为 Arules 交易:
- scala
- ios,按钮,swift3,编译器错误,null
- c#,按引用传递,不安全指针
- spring,maven,spring-mvc,spring-安全,spring-java-config
- android,android-片段,android-片段管理器
- scala,scala 集合
- python-2.7,python-3.x,matplotlib,绘图
由于此数据已经采用篮子格式并且遵循 Arules 文档(https://cran.r-project.org/web/packages/arules/arules.pdf,第 90 页)中的示例 3,我通过执行以下操作转换列:
######################################################################################################
#Option 1 - converting data.frame as described in the documentation (page 90)
######################################################################################################
## example 3: creating transactions from data.frame
a_df <- data.frame(
Tags = as.factor(c("scala",
"ios, button, swift3, compiler-errors, null",
"c#, pass-by-reference, unsafe-pointers",
"spring, maven, spring-mvc, spring-security, spring-java-config",
"android, android-fragments, android-fragmentmanager",
"scala, scala-collections",
"python-2.7, python-3.x, matplotlib, plot"))
)
## coerce
trans3 <- as(a_df, "transactions")
rules <- apriori(trans3, parameter = list(sup = 0.1, conf = 0.5, target="rules",minlen=1))
rules_output <- as(rules,"data.frame")
## Result: 0 rules
######################################################################################################
# Option 2 - reading from a CSV file, which contains exactly the same data
# above without the header and the quotes
######################################################################################################
file = "Test.csv"
trans3 = read.transactions(file = file, sep = ",", format = c("basket"))
rules <- apriori(trans3, parameter = list(sup = 0.1, conf = 0.5, target="rules",minlen=1))
rules_output <- as(rules,"data.frame")
## Result: 198 rules
选项 1 - 结果 = 0 规则
选项 2 - 结果 = 198 条规则
问题:
在我当前的任务和环境中,我无法将 data.frame 列保存到格式化的平面文件(CSV 或任何其他文件),然后使用 read.transactions(将选项1翻译成选项2)。 如何以正确的格式转换 data.frame 列以便正确使用 Arules 先验算法?
看看? transactions
中的例子。您需要一个包含项目向量(项目标签)的列表,而不是 data.frame
.
items <- strsplit(as.character(a_df$Tags), ", ")
trans3 <- as(items, "transactions")
rules <- apriori(trans3, parameter = list(sup = 0.1, conf = 0.5, target="rules",minlen=1))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen maxlen
0.5 0.1 1 none FALSE TRUE 5 0.1 1 10
target ext
rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 0
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[22 item(s), 7 transaction(s)] done [0.00s].
sorting and recoding items ... [22 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [198 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].