R中的rpart:公式中的变量
rpart in R: variable in formula
我在文本数据上使用决策树,我将 n 个最常用的术语存储在一个变量中,我尝试在 rpart
函数的公式中使用这个变量。但是,我得到的错误如下:
Error in model.frame.default(formula = class ~ x, data = dtm.df, na.action = function (x): variable lengths differ (found for 'x')
x = findFreqTerms(dtm, 0.5)
fit = rpart(class ~ x, data = dtm.train
是否可以自动填写公式而无需手动输入每个特征?
解决方案的技术基础
您想要实现的想法是根据最常用的术语(字符串)创建公式。
总体思路是从变量创建公式,而不是像我评论中那样使用输入文本(即 直接写 class~Variable1 + Variable2 + Variable3
)。
This link provide an example on how to use a string to create a formula, and you will also need to look at the collapse
argument of the paste()
function in R documentation。
代码
# First find your most frequent terms
x <- findFreqTerms(dtm, 0.5)
# Then prepare the variables for the formula
sMeasureVar <- "class"
sGroupVars <- paste(x, collapse = " + ")
# Create the formula from the variables
fRpart <- as.formula(paste(sMeasureVar, sGroupVars, sep=" ~ "))
# Fit the tree associated
fit <- rpart(formula = fRpart , data = dtm.train)
我在文本数据上使用决策树,我将 n 个最常用的术语存储在一个变量中,我尝试在 rpart
函数的公式中使用这个变量。但是,我得到的错误如下:
Error in model.frame.default(formula = class ~ x, data = dtm.df, na.action = function (x): variable lengths differ (found for 'x')
x = findFreqTerms(dtm, 0.5)
fit = rpart(class ~ x, data = dtm.train
是否可以自动填写公式而无需手动输入每个特征?
解决方案的技术基础
您想要实现的想法是根据最常用的术语(字符串)创建公式。
总体思路是从变量创建公式,而不是像我评论中那样使用输入文本(即 直接写 class~Variable1 + Variable2 + Variable3
)。
This link provide an example on how to use a string to create a formula, and you will also need to look at the collapse
argument of the paste()
function in R documentation。
代码
# First find your most frequent terms
x <- findFreqTerms(dtm, 0.5)
# Then prepare the variables for the formula
sMeasureVar <- "class"
sGroupVars <- paste(x, collapse = " + ")
# Create the formula from the variables
fRpart <- as.formula(paste(sMeasureVar, sGroupVars, sep=" ~ "))
# Fit the tree associated
fit <- rpart(formula = fRpart , data = dtm.train)