R函数不循环遍历列但重复第一行结果

Question

我正在尝试使用语料库包词干小插图中建议的词干提取功能https://cran.r-project.org/web/packages/corpus/vignettes/stemmer.html

但是当我尝试运行整个列上的函数时，它似乎只是在将第一行的结果重复到其余行。我猜这与以下函数中的 [[1]] 有关。我猜解决方案与 "for i in x" 类似，但我对编写函数不够熟悉，不知道如何解决这个问题。

df <- data.frame(x = 1:7, y= c("love", "lover", "lovely", "base", "snoop", "dawg", "pound"), stringsAsFactors=FALSE)

stem_hunspell <- function(term) {
    # look up the term in the dictionary
    stems <- hunspell::hunspell_stem(term)[[1]]

    if (length(stems) == 0) { # if there are no stems, use the original term
        stem <- term
    } else { # if there are multiple stems, use the last one
        stem <- stems[[length(stems)]]
    }

    stem
}

df[3] <- stem_hunspell(df$y)

Answer 1

你的直觉是对的。

hunspell_stem(term) returns 一个 list 长度 length(term) 的字符向量。

向量似乎有这个词，但前提是它在字典中作为第一个元素找到，如果它还不是词干，则词干作为第二个元素。

> hunspell::hunspell_stem(df$y)
[[1]]
[1] "love"

[[2]]
[1] "lover" "love" 

[[3]]
[1] "lovely" "love"  

[[4]]
[1] "base"

[[5]]
[1] "snoop"

[[6]]
character(0)

[[7]]
[1] "pound"

下面的函数returns词干或原始术语

stem_hunspell <- function(term) {
  stems <- hunspell::hunspell_stem(term)
  output <- character(length(term))

  for (i in seq_along(term)) {
    stem <- stems[[i]]
    if (length(stem) == 0) {
      output[i] <- term[i]
    } else {
      output[i] <- stem[length(stem)]
    }
  }
  return(output)
}

如果你不想返回 dawg 函数就更简单了：

stem_hunspell <- function(term) {
  stems <- hunspell::hunspell_stem(term)
  output <- character(length(term))

  for (i in seq_along(term)) {
    stem <- stems[[i]]
    if (length(stem) > 0) {
      output[i] <- stem[length(stem)]
    }
  }
  return(output)
}

R函数不循环遍历列但重复第一行结果

R function doesn't loop through column but repeats first row result

r

function

stemming