使用 apply 而不是带有 if 语句的嵌套 for 循环
Use apply instead of nested for loop with if statement
我想将一行的文本与以下所有行的文本进行比较,以找出偏差。
下面的代码如何在不使用for循环的情况下转换成代码?
dat <- data.frame(n = seq(1, 19, by = 1),
des = c("Some very long text", "Some very lang test", "Some vary long text", "Some veri long text", "Another very long text", "Anather very long text", "Another very long text", "Different text", "Diferent text", "More text", "More test", "Much more text", "Muh more text", "Some other long text", "Some otoher long text", "Some more text", "Same more text", "New text", "New texd"))
dat <- dat[!duplicated(dat[,c('des')]),]
column <- which(names(dat) == "des")
dupli <- rep(FALSE, nrow(dat))
for (lin in 1:(nrow(dat)-1)){
for (other in (lin+1):nrow(dat))
{
if (stringdist( dat[lin, column], dat[other, column]) < 2)
dupli[lin] <- TRUE
}
}
我想加快这个过程,因为我在大约 5000 行中有很多文本。我想比较第 1 行和第 2 行到第 19 行,依此类推。所以 for 循环对于 5000 行非常慢。是否可以使用一些应用函数来代替?
也许是这样的:
library(stringdist)
dat <- data.frame(n = 1:19, des = c("Some very long text", "Some very lang test", "Some vary long text", "Some veri long text", "Another very long text", "Anather very long text", "Another very long text", "Different text", "Diferent text", "More text", "More test", "Much more text", "Muh more text", "Some other long text", "Some otoher long text", "Some more text", "Same more text", "New text", "New texd"))
column <- which(names(dat) == "des")
N <- nrow(dat)
#change outer loop to sapply
dupli <- c(sapply(1:(N-1), function(row){
#change inner loop to arraywise processing and aggregate with any
any(stringdist(dat[row, column], dat[(row+1):N, column]) < 2)
}), FALSE)
不是那么快,但比普通的 for 循环要快。 cbind(dat, dupli)
比给予
n des dupli
1 1 Some very long text TRUE
2 2 Some very lang test FALSE
3 3 Some vary long text FALSE
4 4 Some veri long text FALSE
5 5 Another very long text TRUE
6 6 Anather very long text TRUE
7 7 Another very long text FALSE
8 8 Different text TRUE
9 9 Diferent text FALSE
10 10 More text TRUE
11 11 More test FALSE
12 12 Much more text TRUE
13 13 Muh more text FALSE
14 14 Some other long text TRUE
15 15 Some otoher long text FALSE
16 16 Some more text TRUE
17 17 Same more text FALSE
18 18 New text TRUE
19 19 New texd FALSE
我想将一行的文本与以下所有行的文本进行比较,以找出偏差。 下面的代码如何在不使用for循环的情况下转换成代码?
dat <- data.frame(n = seq(1, 19, by = 1),
des = c("Some very long text", "Some very lang test", "Some vary long text", "Some veri long text", "Another very long text", "Anather very long text", "Another very long text", "Different text", "Diferent text", "More text", "More test", "Much more text", "Muh more text", "Some other long text", "Some otoher long text", "Some more text", "Same more text", "New text", "New texd"))
dat <- dat[!duplicated(dat[,c('des')]),]
column <- which(names(dat) == "des")
dupli <- rep(FALSE, nrow(dat))
for (lin in 1:(nrow(dat)-1)){
for (other in (lin+1):nrow(dat))
{
if (stringdist( dat[lin, column], dat[other, column]) < 2)
dupli[lin] <- TRUE
}
}
我想加快这个过程,因为我在大约 5000 行中有很多文本。我想比较第 1 行和第 2 行到第 19 行,依此类推。所以 for 循环对于 5000 行非常慢。是否可以使用一些应用函数来代替?
也许是这样的:
library(stringdist)
dat <- data.frame(n = 1:19, des = c("Some very long text", "Some very lang test", "Some vary long text", "Some veri long text", "Another very long text", "Anather very long text", "Another very long text", "Different text", "Diferent text", "More text", "More test", "Much more text", "Muh more text", "Some other long text", "Some otoher long text", "Some more text", "Same more text", "New text", "New texd"))
column <- which(names(dat) == "des")
N <- nrow(dat)
#change outer loop to sapply
dupli <- c(sapply(1:(N-1), function(row){
#change inner loop to arraywise processing and aggregate with any
any(stringdist(dat[row, column], dat[(row+1):N, column]) < 2)
}), FALSE)
不是那么快,但比普通的 for 循环要快。 cbind(dat, dupli)
比给予
n des dupli
1 1 Some very long text TRUE
2 2 Some very lang test FALSE
3 3 Some vary long text FALSE
4 4 Some veri long text FALSE
5 5 Another very long text TRUE
6 6 Anather very long text TRUE
7 7 Another very long text FALSE
8 8 Different text TRUE
9 9 Diferent text FALSE
10 10 More text TRUE
11 11 More test FALSE
12 12 Much more text TRUE
13 13 Muh more text FALSE
14 14 Some other long text TRUE
15 15 Some otoher long text FALSE
16 16 Some more text TRUE
17 17 Same more text FALSE
18 18 New text TRUE
19 19 New texd FALSE