r:根据另一个向量中的两次出现在一个向量中随机分配“1”或“2”
r: randomly assigning "1" or "2" in a vector based on double-occurrences in another vector
我在下面构建了以下代码。如果向量 v1 中的元素出现两次,则它应将值“1”或“2”分配给向量 v2,例如"A" 在向量 v1 中出现了两次,因此在相应的行中,v2 应该一次读取“1”,在另一种情况下应该读取“2”。
代码工作正常,除了在某些情况下,类似的数字被分配给 v2,当 v1 中的元素出现两次时,这显然不是这种情况。
有人可以帮我解决这个问题吗?谢谢!
v1 <- c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K"))
v2 <- rep(3,length(v1))
df1 <- data.frame(v1,v2)
for (i in 1:length(df1$v1)) {
if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==3) {
df1$v2[i] <- sample(c(1,2),1,replace=TRUE)
} else if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==1) {
df1$v2[i] <- 2
} else if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==2) {
df1$v2[i] <- 1
} else {
df1$v2[i] <- 2
}
}
使用 base R,我认为您可以通过使用 table
和 sequence
连接并操纵输出来轻松地达到您想要的结果。
编辑:经过您的评论。我现在想我明白你什么了。
res <- data.frame(v1, v2 = sequence(table(v1)), row.names = NULL)
res <- res[sample(1:nrow(res)), ] # Scramble data order
res <- res[order(res$v1), ] # Reorder by v1 column
# v1 v2
#1 A 1
#2 A 2
#3 B 1
#4 B 2
#5 C 1
#6 C 2
#7 D 2 # note 2 comes first here
#8 D 1
#9 E 1
#10 E 2
#11 F 1
#12 F 2
#13 G 1
#14 G 2
#15 H 1
#16 I 1
#17 J 1
#18 K 1
Edit2 "randomly" 赋值前排序:
df1 <- data.frame(v1)
df1[order(rank(v1, ties.method = "random")), "v2"] <- sequence(table(v1))
df1
v1 <- c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K"))
value = 1:length(v1)
v2 <- rep(3,length(v1))
df1 <- data.frame(v1,value,v2)
library(dplyr)
set.seed(9)
df1 %>%
sample_frac(1) %>% # shuffle rows
group_by(v1) %>% # for each v1 value
mutate(v2 = row_number()) %>% # count and flag occurences
ungroup() %>% # forget the grouping
arrange(v1) # order by v1 (only for visualisation purposes)
# # A tibble: 18 x 3
# v1 value v2
# <fct> <int> <int>
# 1 A 1 1
# 2 A 2 2
# 3 B 4 1
# 4 B 3 2
# 5 C 5 1
# 6 C 6 2
# 7 D 7 1
# 8 D 8 2
# 9 E 9 1
#10 E 10 2
#11 F 12 1
#12 F 11 2
#13 G 14 1
#14 G 13 2
#15 H 15 1
#16 I 16 1
#17 J 17 1
#18 K 18 1
我想我已经理解了您的需求,希望下面的内容可以满足您的需求,使用 dplyr
。它将随机分配从 1 到 n 的整数值,其中 n 是给定字母的出现次数(请注意,这可以根据您对出现 2 次的要求进行概括)。
library(dplyr)
df1 <- data.frame(v1 = c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K")))
df1 <- df1 %>%
group_by(v1) %>%
mutate(v2 = case_when(n() > 1 ~ sample(c(1:n()), n(), replace = FALSE),
TRUE ~ 1L))
我在下面构建了以下代码。如果向量 v1 中的元素出现两次,则它应将值“1”或“2”分配给向量 v2,例如"A" 在向量 v1 中出现了两次,因此在相应的行中,v2 应该一次读取“1”,在另一种情况下应该读取“2”。
代码工作正常,除了在某些情况下,类似的数字被分配给 v2,当 v1 中的元素出现两次时,这显然不是这种情况。
有人可以帮我解决这个问题吗?谢谢!
v1 <- c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K"))
v2 <- rep(3,length(v1))
df1 <- data.frame(v1,v2)
for (i in 1:length(df1$v1)) {
if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==3) {
df1$v2[i] <- sample(c(1,2),1,replace=TRUE)
} else if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==1) {
df1$v2[i] <- 2
} else if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==2) {
df1$v2[i] <- 1
} else {
df1$v2[i] <- 2
}
}
使用 base R,我认为您可以通过使用 table
和 sequence
连接并操纵输出来轻松地达到您想要的结果。
编辑:经过您的评论。我现在想我明白你什么了。
res <- data.frame(v1, v2 = sequence(table(v1)), row.names = NULL)
res <- res[sample(1:nrow(res)), ] # Scramble data order
res <- res[order(res$v1), ] # Reorder by v1 column
# v1 v2
#1 A 1
#2 A 2
#3 B 1
#4 B 2
#5 C 1
#6 C 2
#7 D 2 # note 2 comes first here
#8 D 1
#9 E 1
#10 E 2
#11 F 1
#12 F 2
#13 G 1
#14 G 2
#15 H 1
#16 I 1
#17 J 1
#18 K 1
Edit2 "randomly" 赋值前排序:
df1 <- data.frame(v1)
df1[order(rank(v1, ties.method = "random")), "v2"] <- sequence(table(v1))
df1
v1 <- c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K"))
value = 1:length(v1)
v2 <- rep(3,length(v1))
df1 <- data.frame(v1,value,v2)
library(dplyr)
set.seed(9)
df1 %>%
sample_frac(1) %>% # shuffle rows
group_by(v1) %>% # for each v1 value
mutate(v2 = row_number()) %>% # count and flag occurences
ungroup() %>% # forget the grouping
arrange(v1) # order by v1 (only for visualisation purposes)
# # A tibble: 18 x 3
# v1 value v2
# <fct> <int> <int>
# 1 A 1 1
# 2 A 2 2
# 3 B 4 1
# 4 B 3 2
# 5 C 5 1
# 6 C 6 2
# 7 D 7 1
# 8 D 8 2
# 9 E 9 1
#10 E 10 2
#11 F 12 1
#12 F 11 2
#13 G 14 1
#14 G 13 2
#15 H 15 1
#16 I 16 1
#17 J 17 1
#18 K 18 1
我想我已经理解了您的需求,希望下面的内容可以满足您的需求,使用 dplyr
。它将随机分配从 1 到 n 的整数值,其中 n 是给定字母的出现次数(请注意,这可以根据您对出现 2 次的要求进行概括)。
library(dplyr)
df1 <- data.frame(v1 = c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K")))
df1 <- df1 %>%
group_by(v1) %>%
mutate(v2 = case_when(n() > 1 ~ sample(c(1:n()), n(), replace = FALSE),
TRUE ~ 1L))