循环左连接
Loop over left joins
我一直在尝试循环左连接(使用 R)。我需要创建一个 table,其中的列表示来自更大 table 的样本。新 table 的每一列应代表这些样本中的每一个。
library(tidyr)
largetable <- data.frame(PlotCode=c(rep("Plot1",20),rep("Plot2",20)),
Category=c(rep("A",8),rep("B",8),rep("C",4),rep("A",12),rep("B",4),rep("C",4)))
a <- data.frame(PlotCode=c("Plot1","Plot1","Plot2","Plot2"),
Category=c("A","B","A","B"))
##example of code to loop over 100 left joins derived from samples of two elements from a large table. It fails to create the columns.
for (i in 1:100){
count <- largetable %>% group_by(PlotCode) %>% sample_n(2, replace = TRUE)%>%
count(PlotCode,Category)
colnames(count)[3] <- paste0("n",i)
b <- left_join(a, count, by = c("PlotCode","Category"))
}
##example of desired output table. Columns n1 to n100 should change depending of samples.
b <- data.frame(PlotCode=c("Plot1","Plot1","Plot2","Plot2"),
Category=c("A","B","A","B"),
n1=c(2,1,0,1),
n2=c(1,1,1,1),
n3=c(2,0,1,2))
如何遍历左连接以使每一列对应于不同的样本?
我们可以使用 rerun
/replicate
代替 for
循环来重复一个过程 n
次。
在每次迭代中,我们从每个 PlotCode
和 count
他们的 Category
中随机 select 2 行,因此您将有 n
个可以加入的列表一起使用 reduce
并根据您的选择重命名列并将 NA
替换为 0.
library(dplyr)
library(purrr)
n <- 10
rerun(n, largetable %>%
group_by(PlotCode) %>%
slice_sample(n = 2, replace = TRUE) %>%
count(PlotCode,Category)) %>%
reduce(full_join, by = c('PlotCode', 'Category')) %>%
rename_with(~paste0('n', seq_along(.)), starts_with('n')) %>%
mutate(across(starts_with('n'), tidyr::replace_na, 0))
# PlotCode Category n1 n2 n3 n4 n5 n6 n7 n8 n9 n10
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Plot1 A 1 0 2 2 0 1 0 1 2 2
#2 Plot1 B 1 0 0 0 1 1 2 1 0 0
#3 Plot2 B 1 0 0 0 1 0 0 0 0 0
#4 Plot2 C 1 2 0 0 0 0 1 1 0 0
#5 Plot1 C 0 2 0 0 1 0 0 0 0 0
#6 Plot2 A 0 0 2 2 1 2 1 1 2 2
我一直在尝试循环左连接(使用 R)。我需要创建一个 table,其中的列表示来自更大 table 的样本。新 table 的每一列应代表这些样本中的每一个。
library(tidyr)
largetable <- data.frame(PlotCode=c(rep("Plot1",20),rep("Plot2",20)),
Category=c(rep("A",8),rep("B",8),rep("C",4),rep("A",12),rep("B",4),rep("C",4)))
a <- data.frame(PlotCode=c("Plot1","Plot1","Plot2","Plot2"),
Category=c("A","B","A","B"))
##example of code to loop over 100 left joins derived from samples of two elements from a large table. It fails to create the columns.
for (i in 1:100){
count <- largetable %>% group_by(PlotCode) %>% sample_n(2, replace = TRUE)%>%
count(PlotCode,Category)
colnames(count)[3] <- paste0("n",i)
b <- left_join(a, count, by = c("PlotCode","Category"))
}
##example of desired output table. Columns n1 to n100 should change depending of samples.
b <- data.frame(PlotCode=c("Plot1","Plot1","Plot2","Plot2"),
Category=c("A","B","A","B"),
n1=c(2,1,0,1),
n2=c(1,1,1,1),
n3=c(2,0,1,2))
如何遍历左连接以使每一列对应于不同的样本?
我们可以使用 rerun
/replicate
代替 for
循环来重复一个过程 n
次。
在每次迭代中,我们从每个 PlotCode
和 count
他们的 Category
中随机 select 2 行,因此您将有 n
个可以加入的列表一起使用 reduce
并根据您的选择重命名列并将 NA
替换为 0.
library(dplyr)
library(purrr)
n <- 10
rerun(n, largetable %>%
group_by(PlotCode) %>%
slice_sample(n = 2, replace = TRUE) %>%
count(PlotCode,Category)) %>%
reduce(full_join, by = c('PlotCode', 'Category')) %>%
rename_with(~paste0('n', seq_along(.)), starts_with('n')) %>%
mutate(across(starts_with('n'), tidyr::replace_na, 0))
# PlotCode Category n1 n2 n3 n4 n5 n6 n7 n8 n9 n10
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Plot1 A 1 0 2 2 0 1 0 1 2 2
#2 Plot1 B 1 0 0 0 1 1 2 1 0 0
#3 Plot2 B 1 0 0 0 1 0 0 0 0 0
#4 Plot2 C 1 2 0 0 0 0 1 1 0 0
#5 Plot1 C 0 2 0 0 1 0 0 0 0 0
#6 Plot2 A 0 0 2 2 1 2 1 1 2 2