在 R 中使用 for 循环时避免重复

Question

我有一个基本数据框，其中一列包含 4 个字母：

a
b
c
d

我希望使用嵌套的 for 循环将每个字母与新数据框中的所有其他字母绑定，但要避免将一个字母绑定到自身并避免重复。到目前为止，我可以避免前者，但在处理后者时遇到了麻烦。我的代码如下所示：

d <- data.frame(c("a", "b", "c", "d"))
e <- data.frame()

for (j in d[,1]) {
  
  for (i in d[,1]) {
    
    if (j != i) {
      e <- rbind(e, c(j, i))
    }
  }
}

这会产生以下结果：

a b #this row
a c
a d
b a #and this row are duplicates
b c
b d
c a
c b
c d
d a
d b
d c

我希望使用嵌套的 for 循环来生成：

a b
a c
a d
b c
b d
c d

我知道通过每次向下移动一行（在数据帧 d 中）来使用 for 循环运行可能会起作用，但我不确定如何编写代码。我很感激任何建议！

Answer 1

这是 combn 的情况，无需循环即可轻松完成

t(combn(d[[1]], 2))

-输出

#     [,1] [,2]
#[1,] "a"  "b" 
#[2,] "a"  "c" 
#[3,] "a"  "d" 
#[4,] "b"  "c" 
#[5,] "b"  "d" 
#[6,] "c"  "d"

如果OP想使用循环，添加一些条件

e <- data.frame(col1 = "", col2 = "")

for (j in d[,1]) {  
  for (i in d[,1]) {    
    if (j != i) {
       
         i1 <- !(any((i == e[[1]] & j == e[[2]])))
         
         i2 <- !(any((j %in% e[[1]] && i %in% e[[2]])))
         
         if(i1 & i2) {
          
         e <- rbind(e, c(j, i))
         
    }
  }
}
}

-输出

e[-1,]
col1 col2
2    a    b
3    a    c
4    a    d
5    b    c
6    c    d
7    d    b

Answer 2

同意@akrun的建议。根据经验，在 R 中几乎不需要对任何类型的字符串（或通常任何）数据操作使用循环。

查看此速度比较：

d <- data.frame(c(letters))
e <- data.frame()

solutionCustom <- function(x){
  for (j in d[,1]) {
    for (i in d[,1]) {
      if (j != i) {
        e <- rbind(e, c(j, i))
      }
    }
  }
  e
}

solutionCombn <- function(x) t(combn(d[,1], 2))

library(microbenchmark)

microbenchmark(solutionCustom=solutionCustom(),
               solutionCombn=solutionCombn())

Unit: microseconds
           expr       min        lq       mean     median         uq       max neval
 solutionCustom 44769.620 48898.410 54423.5789 54018.3875 57949.8755 76922.178   100
  solutionCombn   238.311   267.486   294.4763   286.2005   305.8805   605.728   100

combn 解决方案的速度提高了大约 188 倍，代码编写密集度也降低了。每当您必须在 R 中使用循环时，您很可能会错过更有效的解决方案。

在 R 中使用 for 循环时避免重复

Avoiding duplicates when using a for loop in R

for-loop

r

duplicates