在 for 循环中，如何在 "starts_with" 引号内插入变量 i ？

Question

我有这个大数据框，行中有物种，列中有样本。有 30 个样本，每个样本有 12 个重复。列名是这样写的：sample.S1.01;样品.S1.02.....样品.S30.11；示例.S30.12.

我想创建 30 个新表，其中包含每个样本的 12 个重复。

我有这个命令行，一次可以完美地处理一个样本：

dt<- tab_sp_sum %>%
    select(starts_with("sample.S1."))
assign(paste("tab_sp_1"), dt)

但是当我把它放在一个for循环中时，它就不再起作用了。我想是因为starts_with引用里包含了变量i，不知道怎么写。

for (i in 1:30){
  dt<- tab_sp_sum %>%
    select(starts_with("sample.S",i,".", sep=""))
  assign(paste("tab_sp",i,sep="_"), dt)

虽然最后一行运行良好，但创建了 30 个名称正确的表，但它们是空的。

有什么建议吗？

谢谢

Answer 1

与其使用 assign 并将其存储在不同的对象中，不如尝试使用 list 。使用 paste0 创建您想要 select 的名称，然后使用 map 创建数据帧列表。

library(dplyr)
library(purrr)

df_names <- paste0("sample.S", 1:30, ".")

df1 <- map(df_names, ~tab_sp_sum %>% select(starts_with(.x)))

然后您可以使用 df1[[1]]、df1[[2]] 访问单个数据帧。

在 base R 中，我们可以通过为 select 以 df_names

开头的列创建正则表达式来使用 lapply

df1 <- lapply(df_names, function(x) 
             tab_sp_sum[grep(paste0("^", x), names(tab_sp_sum))])

将其与 built-in iris 数据集一起使用

df_names <- c("Sepal", "Petal")
df1 <- map(df_names, ~iris %>% select(starts_with(.x)))

head(df1[[1]])
#  Sepal.Length Sepal.Width
#1          5.1         3.5
#2          4.9         3.0
#3          4.7         3.2
#4          4.6         3.1
#5          5.0         3.6
#6          5.4         3.9

 head(df1[[2]])
#  Petal.Length Petal.Width
#1          1.4         0.2
#2          1.4         0.2
#3          1.3         0.2
#4          1.5         0.2
#5          1.4         0.2
#6          1.7         0.4

Answer 2

我们可以在base R

中使用split

nm1 <- paste(c("Sepal", "Petal"), collapse="|")
nm2 <- grep(nm1, names(iris), value = TRUE)
out <- split.default(iris[nm2], sub("\..*", "", nm2))
head(out[[1]])
#  Petal.Length Petal.Width
#1          1.4         0.2
#2          1.4         0.2
#3          1.3         0.2
#4          1.5         0.2
#5          1.4         0.2
#6          1.7         0.4

head(out[[2]])
#  Sepal.Length Sepal.Width
#1          5.1         3.5
#2          4.9         3.0
#3          4.7         3.2
#4          4.6         3.1
#5          5.0         3.6
#6          5.4         3.9

或在tidyverse

iris %>%
     select(nm2) %>%
      split.default(str_remove(nm2, "\..*"))

在 for 循环中，如何在 "starts_with" 引号内插入变量 i ？

In a for loop, how do I insert the variable i inside the "starts_with" quotation?

r

dplyr

startswith