r:根据另一个向量中的两次出现在一个向量中随机分配“1”或“2”

r: randomly assigning "1" or "2" in a vector based on double-occurrences in another vector

我在下面构建了以下代码。如果向量 v1 中的元素出现两次,则它应将值“1”或“2”分配给向量 v2,例如"A" 在向量 v1 中出现了两次,因此在相应的行中,v2 应该一次读取“1”,在另一种情况下应该读取“2”。

代码工作正常,除了在某些情况下,类似的数字被分配给 v2,当 v1 中的元素出现两次时,这显然不是这种情况。

有人可以帮我解决这个问题吗?谢谢!

v1 <- c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K"))
v2 <- rep(3,length(v1))
df1 <- data.frame(v1,v2)

for (i in 1:length(df1$v1)) {

  if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==3) {

    df1$v2[i] <- sample(c(1,2),1,replace=TRUE)

  } else if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==1) {

    df1$v2[i] <- 2

  } else if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==2) {

    df1$v2[i] <- 1 

  } else { 

    df1$v2[i] <- 2
  }
}

使用 base R,我认为您可以通过使用 tablesequence 连接并操纵输出来轻松地达到您想要的结果。

编辑:经过您的评论。我现在想我明白你什么了。

res <- data.frame(v1, v2 = sequence(table(v1)), row.names = NULL)
res <- res[sample(1:nrow(res)), ] # Scramble data order
res <- res[order(res$v1), ] # Reorder by v1 column 
#     v1 v2
#1    A  1
#2    A  2
#3    B  1
#4    B  2
#5    C  1
#6    C  2
#7    D  2  # note 2 comes first here
#8    D  1
#9    E  1
#10   E  2
#11   F  1
#12   F  2
#13   G  1
#14   G  2
#15   H  1
#16   I  1
#17   J  1
#18   K  1

Edit2 "randomly" 赋值前排序:

df1 <- data.frame(v1)
df1[order(rank(v1, ties.method = "random")), "v2"] <- sequence(table(v1))
df1               
v1 <- c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K"))
value = 1:length(v1)
v2 <- rep(3,length(v1))
df1 <- data.frame(v1,value,v2)

library(dplyr)

set.seed(9)

df1 %>%
  sample_frac(1) %>%             # shuffle rows
  group_by(v1) %>%               # for each v1 value
  mutate(v2 = row_number()) %>%  # count and flag occurences
  ungroup() %>%                  # forget the grouping
  arrange(v1)                    # order by v1 (only for visualisation purposes)

# # A tibble: 18 x 3
#   v1    value    v2
#   <fct> <int> <int>
# 1 A         1     1
# 2 A         2     2
# 3 B         4     1
# 4 B         3     2
# 5 C         5     1
# 6 C         6     2
# 7 D         7     1
# 8 D         8     2
# 9 E         9     1
#10 E        10     2
#11 F        12     1
#12 F        11     2
#13 G        14     1
#14 G        13     2
#15 H        15     1
#16 I        16     1
#17 J        17     1
#18 K        18     1

我想我已经理解了您的需求,希望下面的内容可以满足您的需求,使用 dplyr。它将随机分配从 1 到 n 的整数值,其中 n 是给定字母的出现次数(请注意,这可以根据您对出现 2 次的要求进行概括)。

library(dplyr)
df1 <- data.frame(v1 = c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K")))

df1 <- df1 %>% 
         group_by(v1) %>% 
         mutate(v2 = case_when(n() > 1 ~ sample(c(1:n()), n(), replace = FALSE), 
                                  TRUE ~ 1L))