试图访问参数的名称而不是它在 R 中的值

Trying to Access the Name of the Parameter & Not its Value in R

我正在尝试创建一个函数来创建一个 Wordcloud,其中包含参数、数据框及其列之一。但是,第一个语句中存在错误。我想让“DataFrame$Column”作为 VectorSource 的参数传递。我怎样才能最好地实现这一目标?

  createsWordcloud <- function(df, col) {
      # An Object of Class VectorSource which extends the Class Source representing a vector where each entry is interpreted as a document.
      # Every Element of the Corpus is stored as a Document...
                                   # The Bug is right here!..
      corpus <- Corpus(VectorSource(paste(df, "$", col, sep="")))

      # Convert the Corpus to Plain Text Document
      corpus  <- tm_map(corpus, PlainTextDocument)

      # Remove Punctuation & STOPWORDS...
      # STOPWORDs are commonly used words in the English Language... i.e. I, me, my
      # To view the full list of STOPWORDS, type stopwords('english') in the Console...
      corpus <- tm_map(corpus, removePunctuation)
      corpus <- tm_map(corpus, removeWords, stopwords('english'))

      # Next we perform STEMMING... All the words will be converted to their stem
      # i.e. learning -> learn, walked -> walk
      # These Words will be Plotted Only Once!
      corpus <- tm_map(corpus, stemDocument)

      wordcloud(corpus, max.words=100, random.order=FALSE)
      # These parameters are used to limit the number of words plotted. 
      # max.words will plot the specified number of words and discard least frequent terms, 
      # whereas, min.freq will discard all terms whose frequency is below the specified value.
    }

有两种方法可以做到这一点;一是采用非标准评价。此外,对于您的特定任务,简单地传入 df$col 并让函数采用向量而不是数据框可能是可行的,因为您只在给定的代码中使用该列。

如果确实需要传入列名,标准方式是将列名作为字符串传入,并用子集([.data.frame)操作符引用:

readcol <- function(df, col) {
  df[, col]
}

然后

> readcol(data.frame(x=1:10), "x")
 [1]  1  2  3  4  5  6  7  8  9 10

评价不规范

如果你真的不想引用列名,你需要对 col 参数进行惰性评估以从数据框中提取它:

readcol.nse <- function(df, col) {
  eval(substitute(col), df, parent.frame())
}

然后

> readcol.nse(data.frame(x=1:10), x)
 [1]  1  2  3  4  5  6  7  8  9 10

标准警告适用于此——要非常小心非标准评估。它很难以编程方式使用(因为将列名传递给另一个函数很棘手),并且可能会对更复杂的表达式产生不直观的副作用。字符串形式有点笨拙,但更易于组合。