使用 apply 而不是带有 if 语句的嵌套 for 循环

Use apply instead of nested for loop with if statement

我想将一行的文本与以下所有行的文本进行比较,以找出偏差。 下面的代码如何在不使用for循环的情况下转换成代码?

dat <- data.frame(n = seq(1, 19, by = 1),
des = c("Some very long text", "Some very lang test", "Some vary long text", "Some veri long text", "Another very long text", "Anather very long text", "Another very long text", "Different text", "Diferent text", "More text", "More test", "Much more text", "Muh more text", "Some other long text", "Some otoher long text", "Some more text", "Same more text", "New text", "New texd"))

dat <- dat[!duplicated(dat[,c('des')]),]

column <- which(names(dat) == "des")
dupli <- rep(FALSE, nrow(dat))
for (lin in 1:(nrow(dat)-1)){
  for (other in (lin+1):nrow(dat))
  {
    if (stringdist( dat[lin, column], dat[other, column]) < 2)  
       dupli[lin] <- TRUE       
  }         
}

我想加快这个过程,因为我在大约 5000 行中有很多文本。我想比较第 1 行和第 2 行到第 19 行,依此类推。所以 for 循环对于 5000 行非常慢。是否可以使用一些应用函数来代替?

也许是这样的:

library(stringdist)
dat <- data.frame(n = 1:19, des = c("Some very long text", "Some very lang test", "Some vary long text", "Some veri long text", "Another very long text", "Anather very long text", "Another very long text", "Different text", "Diferent text", "More text", "More test", "Much more text", "Muh more text", "Some other long text", "Some otoher long text", "Some more text", "Same more text", "New text", "New texd"))

column <- which(names(dat) == "des")
N <- nrow(dat)

#change outer loop to sapply
dupli <- c(sapply(1:(N-1), function(row){
    #change inner loop to arraywise processing and aggregate with any
    any(stringdist(dat[row, column], dat[(row+1):N, column]) < 2)
}), FALSE)

不是那么快,但比普通的 for 循环要快。 cbind(dat, dupli) 比给予

    n                    des dupli
1   1    Some very long text  TRUE
2   2    Some very lang test FALSE
3   3    Some vary long text FALSE
4   4    Some veri long text FALSE
5   5 Another very long text  TRUE
6   6 Anather very long text  TRUE
7   7 Another very long text FALSE
8   8         Different text  TRUE
9   9          Diferent text FALSE
10 10              More text  TRUE
11 11              More test FALSE
12 12         Much more text  TRUE
13 13          Muh more text FALSE
14 14   Some other long text  TRUE
15 15  Some otoher long text FALSE
16 16         Some more text  TRUE
17 17         Same more text FALSE
18 18               New text  TRUE
19 19               New texd FALSE