如何将数据帧转换为可用格式以在 R 中进行序列挖掘?
How to convert dataframe into usable format for sequence mining in R?
我想在 R 中进行序列分析,我正在尝试将我的数据转换为 arulesSequences 包可用的形式。
library(tidyverse)
library(arules)
library(arulesSequences)
df <- data_frame(personID = c(1, 1, 2, 2, 2),
eventID = c(100, 101, 102, 103, 104),
site = c("google", "facebook", "facebook", "askjeeves", "Whosebug"),
sequence = c(1, 2, 1, 2, 3))
df.trans <- as(df, "transactions")
transactionInfo(df.trans)$sequenceID <- df$sequence
transactionInfo(df.trans)$eventID <- df$eventID
seq <- cspade(df.trans, parameter = list(support = 0.4), control = list(verbose = TRUE))
如果像上面那样将我的专栏保留为原来的 class,我会得到一个错误:
Error in asMethod(object) :
column(s) 1, 2, 3, 4 not logical or a factor. Discretize the columns first.
但是,如果我将列转换为因子,则会出现另一个错误:
df <- data_frame(personID = c(1, 1, 2, 2, 2),
eventID = c(100, 101, 102, 103, 104),
site = c("google", "facebook", "facebook", "askjeeves", "Whosebug"),
sequence = c(1, 2, 1, 2, 3))
df <- as.data.frame(lapply(df, as.factor))
df.trans <- as(df, "transactions")
transactionInfo(df.trans)$sequenceID <- df$sequence
transactionInfo(df.trans)$eventID <- df$eventID
seq <- cspade(df.trans, parameter = list(support = 0.4), control = list(verbose = TRUE))
Error in asMethod(object) :
In makebin(data, file) : 'eventID' is a factor
非常感谢任何关于解决这个问题的建议或关于 R 中序列挖掘的建议。谢谢!
只有实际项目(在您的情况下 "site")才会进入交易。始终检查您的中间结果以确保它看起来正确。 ? cspade
.
中描述了序列挖掘所需的交易类型
library("arulesSequences")
df <- data.frame(personID = c(1, 1, 2, 2, 2),
eventID = c(100, 101, 102, 103, 104),
site = c("google", "facebook", "facebook", "askjeeves", "Whosebug"),
sequence = c(1, 2, 1, 2, 3))
# convert site into itemsets and add sequence and event ids
df.trans <- as(df[,"site", drop = FALSE], "transactions")
transactionInfo(df.trans)$sequenceID <- df$sequence
transactionInfo(df.trans)$eventID <- df$eventID
inspect(df.trans)
# sort by sequenceID
df.trans <- df.trans[order(transactionInfo(df.trans)$sequenceID),]
inspect(df.trans)
# mine sequences
seq <- cspade(df.trans, parameter = list(support = 0.2),
control = list(verbose = TRUE))
inspect(seq)
希望对您有所帮助!
我想在 R 中进行序列分析,我正在尝试将我的数据转换为 arulesSequences 包可用的形式。
library(tidyverse)
library(arules)
library(arulesSequences)
df <- data_frame(personID = c(1, 1, 2, 2, 2),
eventID = c(100, 101, 102, 103, 104),
site = c("google", "facebook", "facebook", "askjeeves", "Whosebug"),
sequence = c(1, 2, 1, 2, 3))
df.trans <- as(df, "transactions")
transactionInfo(df.trans)$sequenceID <- df$sequence
transactionInfo(df.trans)$eventID <- df$eventID
seq <- cspade(df.trans, parameter = list(support = 0.4), control = list(verbose = TRUE))
如果像上面那样将我的专栏保留为原来的 class,我会得到一个错误:
Error in asMethod(object) :
column(s) 1, 2, 3, 4 not logical or a factor. Discretize the columns first.
但是,如果我将列转换为因子,则会出现另一个错误:
df <- data_frame(personID = c(1, 1, 2, 2, 2),
eventID = c(100, 101, 102, 103, 104),
site = c("google", "facebook", "facebook", "askjeeves", "Whosebug"),
sequence = c(1, 2, 1, 2, 3))
df <- as.data.frame(lapply(df, as.factor))
df.trans <- as(df, "transactions")
transactionInfo(df.trans)$sequenceID <- df$sequence
transactionInfo(df.trans)$eventID <- df$eventID
seq <- cspade(df.trans, parameter = list(support = 0.4), control = list(verbose = TRUE))
Error in asMethod(object) :
In makebin(data, file) : 'eventID' is a factor
非常感谢任何关于解决这个问题的建议或关于 R 中序列挖掘的建议。谢谢!
只有实际项目(在您的情况下 "site")才会进入交易。始终检查您的中间结果以确保它看起来正确。 ? cspade
.
library("arulesSequences")
df <- data.frame(personID = c(1, 1, 2, 2, 2),
eventID = c(100, 101, 102, 103, 104),
site = c("google", "facebook", "facebook", "askjeeves", "Whosebug"),
sequence = c(1, 2, 1, 2, 3))
# convert site into itemsets and add sequence and event ids
df.trans <- as(df[,"site", drop = FALSE], "transactions")
transactionInfo(df.trans)$sequenceID <- df$sequence
transactionInfo(df.trans)$eventID <- df$eventID
inspect(df.trans)
# sort by sequenceID
df.trans <- df.trans[order(transactionInfo(df.trans)$sequenceID),]
inspect(df.trans)
# mine sequences
seq <- cspade(df.trans, parameter = list(support = 0.2),
control = list(verbose = TRUE))
inspect(seq)
希望对您有所帮助!