使用循环创建对象,该循环对 r 中的列表进行子集化

create object with loop that subsets a list in r

我有一个包含 90 个名称的列表,我想使用循环将其划分并包含到对象中。我已经根据模式选择了列表的名称,但我不确定如何循环创建对象名称。我之前尝试过使用 assign() 函数,但它创建值(在反引号内)而不是对象。谢谢!!!

所以列表有 90 个名称,每个样本名称重复 5 次,所以基本上我总共有 18 个样本,每个样本有 5 个文件。我想为每个样本创建一个对象,其中包含与该样本对应的名称列表,因此列表包含 5 个项目。所以我想创建一个循环而不是复制粘贴函数 (sample.1 = sample.names.dilutions[grep("Sample 1_", sample.names.dilutions)] ) 18 次。我希望这是有道理的?

#list
>sample.names.dilutions
> length(sample.names.dilutions)
[1] 90

#names in list
> sample.names.dilutions[1:20]
 [1] "New AS Plate 21_AS Plate_Sample 1_100.fcs"  "New AS Plate 21_AS Plate_Sample 1_25.fcs"  
 [3] "New AS Plate 21_AS Plate_Sample 1_250.fcs"  "New AS Plate 21_AS Plate_Sample 1_50.fcs"  
 [5] "New AS Plate 21_AS Plate_Sample 1_500.fcs"  "New AS Plate 21_AS Plate_Sample 10_100.fcs"
 [7] "New AS Plate 21_AS Plate_Sample 10_25.fcs"  "New AS Plate 21_AS Plate_Sample 10_250.fcs"
 [9] "New AS Plate 21_AS Plate_Sample 10_50.fcs"  "New AS Plate 21_AS Plate_Sample 10_500.fcs"
[11] "New AS Plate 21_AS Plate_Sample 11_100.fcs" "New AS Plate 21_AS Plate_Sample 11_25.fcs" 
[13] "New AS Plate 21_AS Plate_Sample 11_250.fcs" "New AS Plate 21_AS Plate_Sample 11_50.fcs" 
[15] "New AS Plate 21_AS Plate_Sample 11_500.fcs" "New AS Plate 21_AS Plate_Sample 12_100.fcs"
[17] "New AS Plate 21_AS Plate_Sample 12_25.fcs"  "New AS Plate 21_AS Plate_Sample 12_250.fcs"
[19] "New AS Plate 21_AS Plate_Sample 12_50.fcs"  "New AS Plate 21_AS Plate_Sample 12_500.fcs"

#function i want to create with loop
> sample.1 = sample.names.dilutions[grep("Sample 1_", sample.names.dilutions)]
> length(sample.1)
[1] 5
> sample.1
[1] "New AS Plate 21_AS Plate_Sample 1_100.fcs" "New AS Plate 21_AS Plate_Sample 1_25.fcs" 
[3] "New AS Plate 21_AS Plate_Sample 1_250.fcs" "New AS Plate 21_AS Plate_Sample 1_50.fcs" 
[5] "New AS Plate 21_AS Plate_Sample 1_500.fcs"

> #i have 18 different samples and want to assign value and subset according to sample name
> for(i in 1:18) {
+   print(sample.names[i], quote=FALSE) = sample.names.dilutions[grep(paste0("Sample ",i,"_"), sample.names.dilutions)]}

Error in print(sample.names[i], FALSE) <- sample.names.dilutions[grep(paste0("Sample ",  : 
  could not find function "print<-"

我想我现在明白了;感谢您在评论中澄清您的问题。如果我遗漏了什么或者您有任何疑问,请告诉我。

术语,快点

我相信您有兴趣根据每个元素中的模式将一个字符串向量拆分为多个较短的字符串向量。列表只是向量的向量。

g 是一个包含 20 个字符串元素的向量(请参阅下面的数据代码块)。

is.vector(g)
#> [1] TRUE

这是一个只包含一个向量的列表。

str(list(g))
#> List of 1
#>  $ : chr [1:20] "New AS Plate 21_AS Plate_Sample 12_50.fcs" "New AS Plate 21_AS Plate_Sample 1_100.fcs" "New AS Plate 21_AS Plate_Sample 1_25.fcs" "New AS Plate 21_AS Plate_Sample 1_250.fcs" ...

现在进入问题...

在您的问题中,您专门询问了使用 assign() 的问题。尽管使用 assign() 可能很方便,但 [通常不推荐][1]。但有时你必须做你必须做的事,这并不可耻。以下是您如何手动使用它,一次在一组上(就像您在问题中显示的那样)。

# Using assign() one group at a time
h <- g[grep("Sample 1_", g)]
assign(x = "sample_1_group", value = h)

在 for 循环中使用 assign() 非常简单(而且看似合乎逻辑)。

定义 for 循环的第一步是定义循环将“循环”的内容。或者换句话说,在循环的每次迭代中会发生什么变化。对于您的情况,我们正在寻找定义您的组的数字。我们可以手动或以编程方式定义这些数字的向量。

# Define groups manually
ids <- c(12,1,10,11)
ids
#> [1] 12  1 10 11

# Pattern match groups
all_ids <- gsub(pattern = ".*Sample (\d+).*", replacement = "\1", x = g)
all_ids
#>  [1] "12" "1"  "1"  "1"  "1"  "1"  "10" "10" "10" "10" "10" "11" "11" "11" "11"
#> [16] "11" "12" "12" "12" "12"
ids <- unique(all_ids)
ids
#> [1] "12" "1"  "10" "11"

在我们知道要循环的内容之后,我们可以在 in 中定义循环和函数的结构。paste0() 可以成为这里的主力军。下面的循环遍历 id(一次一个 id),在 g 中找到匹配的字符串,并将它们作为向量写入您的环境。因为我们正在使用 assign(),所以我们希望在每次循环迭代后在我们的环境中出现一个新向量。

# For-loop with assign
for(i in ids){
  a <- paste0("Sample ", i, "_")
  h <- g[grep(a, g)]
  h_name <- paste0("sample_", i, "_group")
  assign(x = h_name, value = h)
}

这在技术上可行,但不是最好的。您可能会发现使用列表(向量的向量)来存储来自 for 循环的信息实际上更方便。它的编程速度很快,您的工作空间中不会塞满一堆新对象,上面 link 中的所有可怕的东西(不是真的)都不会成为问题。以下是您可以如何做到这一点:

# Save the results of a for-loop in a list!
# First, make a blank list to hold the results
results <- list()
for(i in ids){
  a <- paste0("Sample ", i, "_")
  h <- g[grep(a, g)]
  h_name <- paste0("sample_", i, "_group")
  results[[h_name]] <- h
}
results
#> $sample_12_group
#> [1] "New AS Plate 21_AS Plate_Sample 12_50.fcs" 
#> [2] "New AS Plate 21_AS Plate_Sample 12_100.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 12_25.fcs" 
#> [4] "New AS Plate 21_AS Plate_Sample 12_250.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 12_500.fcs"
#> 
#> $sample_1_group
#> [1] "New AS Plate 21_AS Plate_Sample 1_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 1_25.fcs" 
#> [3] "New AS Plate 21_AS Plate_Sample 1_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 1_50.fcs" 
#> [5] "New AS Plate 21_AS Plate_Sample 1_500.fcs"
#> 
#> $sample_10_group
#> [1] "New AS Plate 21_AS Plate_Sample 10_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 10_25.fcs" 
#> [3] "New AS Plate 21_AS Plate_Sample 10_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 10_50.fcs" 
#> [5] "New AS Plate 21_AS Plate_Sample 10_500.fcs"
#> 
#> $sample_11_group
#> [1] "New AS Plate 21_AS Plate_Sample 11_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 11_25.fcs" 
#> [3] "New AS Plate 21_AS Plate_Sample 11_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 11_50.fcs" 
#> [5] "New AS Plate 21_AS Plate_Sample 11_500.fcs"

加分

For 循环很棒:很容易看出它们内部发生了什么,很容易在其中进行大量数据处理,而且它们的执行速度通常相当快。但有时一切都是为了 速度 。 R 是矢量化的([老实说,我不太确定这意味着什么] [2] 除了“它可以同时进行多项计算”),但是 for 循环并没有很好地利用这一点。 apply() 向量化函数系列可以,而且在您可能还使用 for 循环的情况下,它们通常很容易实现。以下是您可以如何处理您的数据:

# Vectorized
lapply(ids, function(i) g[grep(paste0("Sample ", i, "_"), g)])
#> [[1]]
#> [1] "New AS Plate 21_AS Plate_Sample 12_50.fcs" 
#> [2] "New AS Plate 21_AS Plate_Sample 12_100.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 12_25.fcs" 
#> [4] "New AS Plate 21_AS Plate_Sample 12_250.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 12_500.fcs"
#> 
#> [[2]]
#> [1] "New AS Plate 21_AS Plate_Sample 1_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 1_25.fcs" 
#> [3] "New AS Plate 21_AS Plate_Sample 1_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 1_50.fcs" 
#> [5] "New AS Plate 21_AS Plate_Sample 1_500.fcs"
#> 
#> [[3]]
#> [1] "New AS Plate 21_AS Plate_Sample 10_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 10_25.fcs" 
#> [3] "New AS Plate 21_AS Plate_Sample 10_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 10_50.fcs" 
#> [5] "New AS Plate 21_AS Plate_Sample 10_500.fcs"
#> 
#> [[4]]
#> [1] "New AS Plate 21_AS Plate_Sample 11_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 11_25.fcs" 
#> [3] "New AS Plate 21_AS Plate_Sample 11_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 11_50.fcs" 
#> [5] "New AS Plate 21_AS Plate_Sample 11_500.fcs"
Created on 2021-10-14 by the reprex package (v2.0.1)

数据:

g <- c("New AS Plate 21_AS Plate_Sample 12_50.fcs", 
       "New AS Plate 21_AS Plate_Sample 1_100.fcs",
       "New AS Plate 21_AS Plate_Sample 1_25.fcs", 
       "New AS Plate 21_AS Plate_Sample 1_250.fcs",
       "New AS Plate 21_AS Plate_Sample 1_50.fcs",
       "New AS Plate 21_AS Plate_Sample 1_500.fcs",
       "New AS Plate 21_AS Plate_Sample 10_100.fcs",
       "New AS Plate 21_AS Plate_Sample 10_25.fcs",
       "New AS Plate 21_AS Plate_Sample 10_250.fcs",
       "New AS Plate 21_AS Plate_Sample 10_50.fcs",
       "New AS Plate 21_AS Plate_Sample 10_500.fcs",
       "New AS Plate 21_AS Plate_Sample 11_100.fcs",
       "New AS Plate 21_AS Plate_Sample 11_25.fcs",
       "New AS Plate 21_AS Plate_Sample 11_250.fcs",
       "New AS Plate 21_AS Plate_Sample 11_50.fcs",
       "New AS Plate 21_AS Plate_Sample 11_500.fcs",
       "New AS Plate 21_AS Plate_Sample 12_100.fcs",
       "New AS Plate 21_AS Plate_Sample 12_25.fcs",
       "New AS Plate 21_AS Plate_Sample 12_250.fcs",
       "New AS Plate 21_AS Plate_Sample 12_500.fcs")

[1]: Why is using assign bad?) [2]: