在列表中的每个元素和 R 中的另一个集合之间使用应用族

Use apply family between each element in a list and another set in R

我正在将一个文本文档拆分为 n 个块并将每个块存储在一个列表中。每个块被转换成一组单词,然后在其中一个块和另一个较短的文本之间应用余弦相似度函数,该文本在发送到该函数之前也被转换成一组。我需要以某种方式将每个块传递给要与第二组进行比较的函数,但想知道应用系列的函数之一是否可以完成这项工作而不是使用循环。将每个结果存储在一个向量中也会节省一些时间。

我用的就是这个(部分代码来自this:

library("data.table","qdap","sets", "lsa")

s <- c("employees businesses san gwann admitted sales taken hit after traffic diversions implemented without notice vjal ir - rihan over weekend.", 
"also complained werent consulted diversion blocked vehicles driving centre      san gwann via roundabout forks san gwann industrial estate, church forced   motorists take detour around block instead.", 
"barriers erected roundabout exit, after youtube video cars disregarding signage passing through roundabout regardless went viral.", 
"planned temporary diversion, brace san gwann influx cars set pass through during works kappara junction project.", 
"usually really busy weekend, our sales lower round, corner store worker maria abela admitted maltatoday.")

c <- "tm dont break whats broken. only queues developing, pass here every morning never experienced such mess notwithstanding tm officials directing traffic. hope report congestion happening area. lc tm tried pro - active hope admit recent traffic changes working."


calculateCosine <- function(setX, setY){
require(qdap)
y <- c(unlist(as.character(tolower(setY))))
x <- c(unlist(strsplit(as.character(tolower(setX)), split = ", ")))
diffLength <- length(y) - length(x)
x <- bag_o_words(x)
for(pad in 1 : diffLength){
  x <- c(x, "")
  }
  # write both sets to temp files and calculate cosine similarity
  write(y, file=paste(td, "Dy", sep="/"))
  write(x, file=paste(td, "Dx", sep="/"))
  myMatrix = textmatrix(td, stopwords=stopwords_en, minWordLength = 3)
  similCosine <- as.numeric(round(cosine(myMatrix[,1], myMatrix[,2]), 3))
  return(similCosine)
}

n <- 3
max <- length(s)%/%n
x <- seq_along(s)
d1 <- split(s, ceiling(x/max))
res <- c()
for(i in 1 : length(d1)){
  val <- calculateCosine(as.set(paste(d1[i], sep = " ", collapse = " ")), as.set(c))
  res <- c(res, val)
}

为了简洁起见,是否可以将循环更改为应用函数之一?任何想法或意见将不胜感激。谢谢

考虑用 repsapply 调整两个 for 循环:

里面计算余弦

# ORIGINAL CODE
x <- bag_o_words(x)
for(pad in 1 : diffLength){
  x <- c(x, "")
  }

# ADJUSTED CODE
x <- bag_o_words(x)
x <- c(x, rep("", diffLength))     

# OR ONE LINE
x <- c(bag_o_words(x), rep("", diffLength))

Outside calculateCosine (如果您需要返回的列表而不是 vector/matrix,请更改为 lapply

# ORIGINAL CODE
res <- c()
for(i in 1 : length(d1)){
  val <- calculateCosine(as.set(paste(d1[i], sep = " ", collapse = " ")), as.set(c))
  res <- c(res, val)
}

# ADJUSTED CODE
res <- sapply(d1, function(i) {
  calculateCosine(as.set(paste(i, sep = " ", collapse = " ")), as.set(c))
})