如何在 R 中使用矢量化根据条件更改 DF 值？

Question

假设我有以下DF：

C1	C2
0	0
1	1
1	1
0	0
.	.
.	.

我现在想在 Dataframe 上应用以下条件：

C1 的值应为 1
0 到 5 之间的随机整数应小于 2

如果这两个条件都成立，我将该行的 C1 和 C2 值更改为 2

我知道这可以通过使用 apply 函数来完成，我使用了以下方法：

C1 <- c(0, 1,1,0,1,0,1,0,1,0,1)
C2 <- c(0, 1,1,0,1,0,1,0,1,0,1)

df <- data.frame(C1, C2)

fun <- function(x){
  if (sample(0:5, 1) < 2){
    x[1:2] <- 2
  }
  return (x)
}

index <- df$C1 ==1  // First Condition
processed_Df <-t(apply(df[index,],1,fun)) // Applies Second Condition
df[index,] <-  processed_Df

输出：

C1	C2
0	0
2	2
1	1
0	0
.	.
.	.

Some Rows have both conditions met, some doesn't (This is the main functionality, I would like to achieve)

现在我想使用矢量化而不使用循环或 apply 函数来实现同样的目的。我唯一的困惑是“如果我不使用 apply，根据条件的结果，每一行不会得到相同的结果吗？（例如，以下:)

df$C1 <- ifelse(df$C1==1 & sample(0:5, 1) < 5, 2, df$C1)

This changes all the rows in my DF with C1==2 to 2 when there should possibly be many 1's.

有没有办法在不使用 apply 函数的情况下为每一行的第二个条件获得不同的结果？希望我的问题是有道理的。

谢谢

Answer 1

这是一种完全矢量化的方式。就像在问题中一样创建逻辑索引 index 。然后在对 sample 的一次调用中对所有随机整数 r 进行采样。根据索引和条件 r < 2.

的结合就地替换

x <- 'C1    C2
0   0
1   1
1   1
0   0'
df1 <- read.table(textConnection(x), header = TRUE)

set.seed(1)
index <- df1$C1 == 1
r <- sample(0:5, length(index), TRUE)
df1[index & r < 2, c("C1", "C2")] <- 2
df1
#>   C1 C2
#> 1  0  0
#> 2  1  1
#> 3  2  2
#> 4  0  0

^{由 reprex package (v2.0.1)}

创建于 2022-05-11

Answer 2

您需要 sample 值 nrow 次。试试这个方法 -

set.seed(167814)
df[df$C1 == 1 & sample(0:5, nrow(df), replace = TRUE) < 2, ] <- 2
df

#   C1 C2
#1   0  0
#2   2  2
#3   2  2
#4   0  0
#5   1  1
#6   0  0
#7   2  2
#8   0  0
#9   1  1
#10  0  0
#11  1  1

如何在 R 中使用矢量化根据条件更改 DF 值？

How can I use vectorisation in R to change a DF value based on a condition?

r

vectorization

dataframe