R中的rpart:公式中的变量

rpart in R: variable in formula

我在文本数据上使用决策树,我将 n 个最常用的术语存储在一个变量中,我尝试在 rpart 函数的公式中使用这个变量。但是,我得到的错误如下:

Error in model.frame.default(formula = class ~ x, data = dtm.df, na.action = function (x): variable lengths differ (found for 'x')

x = findFreqTerms(dtm, 0.5)
fit = rpart(class ~ x, data = dtm.train

是否可以自动填写公式而无需手动输入每个特征?

解决方案的技术基础

您想要实现的想法是根据最常用的术语(字符串)创建公式。

总体思路是从变量创建公式,而不是像我评论中那样使用输入文本( 直接写 class~Variable1 + Variable2 + Variable3 )。

This link provide an example on how to use a string to create a formula, and you will also need to look at the collapse argument of the paste() function in R documentation

代码

# First find your most frequent terms
x <- findFreqTerms(dtm, 0.5)

# Then prepare the variables for the formula
sMeasureVar <- "class"
sGroupVars  <- paste(x, collapse = " + ")

# Create the formula from the variables
fRpart <- as.formula(paste(sMeasureVar, sGroupVars, sep=" ~ "))

# Fit the tree associated
fit <- rpart(formula = fRpart , data = dtm.train)