为什么 assign() 在 for() 循环中与 R 中的 dplyr 管道表现奇怪?

Why assign() is behaving oddly in for() loop with dplyr pipes in R?

我需要在我的全局环境中分配的数据帧中循环不同的函数,并将循环的每个 "run" 的输出保存在包含初始名称的新数据帧中。 为此,我将 assign() 与 for() 循环一起使用。它运行良好,除非我使用 dplyr 管道 %>%。该函数本身有效,但分配给输出数据帧的名称存在一些错误。我怎样才能用 %>% 解决这个问题?如果无法修复,我可以将 assign() 更改为另一个函数吗?

这个效果很好:

code1:
for(i in unique(table$V1)){ 
    assign(paste0(i, "_target"),table[grepl(i,table$V1),])
  }

解释:选择 "table" 的第 1 列中的唯一条目,并将包含这些条目的行子集化为每个条目的新数据框。输出:新的数据框名称是 "entry name" + "_target"

这不是很好(我想知道为什么):

code2:
for(i in mget(ls(pattern = "_target"))){
    assign(paste0(i, "_slim"),data.frame(i %>% group_by(Sample.Name) %>% summarise(Mean_dC=mean(C__))))
  }

解释:选择全局环境中名称包含“_target”的所有数据帧。在每个数据帧中:它计算与具有相同字符“(Sample.Name)”的条目关联的值“(C__)”的平均值。应该输出:新的数据框名称是 "entry name_target" + "_slim"。实际输出:新数据框呈现相同字符的平均值,但被命名为 "c(aleatory numbers)_slim".

code2 输入:

STA_target <- structure(list(Well = structure(c(8L, 9L, 10L, 21L, 22L, 23L, 
33L, 34L, 35L, 46L, 47L, 48L, 58L, 59L, 60L, 73L, 74L, 75L, 85L, 
86L, 87L, 97L, 98L, 99L), .Label = c("", "A1", "A10", "A11", 
"A12", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "Analysis Type", 
"B1", "B10", "B11", "B12", "B2", "B3", "B4", "B5", "B6", "B7", 
"B8", "B9", "C1", "C10", "C11", "C12", "C2", "C3", "C4", "C5", 
"C6", "C7", "C8", "C9", "Chemistry", "D1", "D10", "D11", "D12", 
"D2", "D3", "D4", "D5", "D6", "D7", "D8", "D9", "E1", "E10", 
"E11", "E12", "E2", "E3", "E4", "E5", "E6", "E7", "E8", "E9", 
"Endogenous Control", "Experiment File Name", "Experiment Run End Time", 
"F1", "F10", "F11", "F12", "F2", "F3", "F4", "F5", "F6", "F7", 
"F8", "F9", "G1", "G10", "G11", "G12", "G2", "G3", "G4", "G5", 
"G6", "G7", "G8", "G9", "H1", "H10", "H11", "H12", "H2", "H3", 
"H4", "H5", "H6", "H7", "H8", "H9", "Instrument Type", "Passive Reference", 
"Reference Sample", "RQ Min/Max Confidence Level", "Well"), class = "factor"), 
    Sample.Name = c("Control_in", "Control_in", "Control_in", 
    "Sample2_in", "Sample2_in", "Sample2_in", "Sample5_in", "Sample5_in", 
    "Sample5_in", "Sample3_in", "Sample3_in", "Sample3_in", "Control_c", 
    "Control_c", "Control_c", "Sample2_c", "Sample2_c", "Sample2_c", 
    "Sample3_c", "Sample3_c", "Sample3_c", "Sample5_c", "Sample5_c", 
    "Sample5_c"), Target.Name = c("STA", "STA", "STA", "STA", 
    "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", 
    "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", 
    "STA", "STA"), Task = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L), .Label = c("", "Task", "UNKNOWN"), class = "factor"), 
    Reporter = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L
    ), .Label = c("", "Reporter", "SYBR"), class = "factor"), 
    Quencher = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
    ), .Label = c("", "None", "Quencher"), class = "factor"), 
    RQ = structure(c(12L, 12L, 12L, 8L, 8L, 8L, 6L, 6L, 6L, 11L, 
    11L, 11L, 1L, 1L, 1L, 5L, 5L, 5L, 14L, 14L, 14L, 18L, 18L, 
    18L), .Label = c("", "0.706286132", "0.714652956", "0.724364996", 
    "0.7665869", "0.828774512", "0.838611245", "0.846661508", 
    "0.863589227", "0.896049678", "0.929288268", "1", "1.829339266", 
    "15.57538891", "17.64183807", "27.67574501", "3.064466953", 
    "34.78881073", "41.82569504", "8.117406845", "8.884188652", 
    "RQ"), class = "factor"), RQ.Min = structure(c(9L, 9L, 9L, 
    7L, 7L, 7L, 8L, 8L, 8L, 10L, 10L, 10L, 1L, 1L, 1L, 2L, 2L, 
    2L, 21L, 21L, 21L, 17L, 17L, 17L), .Label = c("", "0.032458056", 
    "0.429091513", "0.460811675", "0.541289926", "0.611138761", 
    "0.674698055", "0.71383971", "0.742018044", "0.753834546", 
    "0.772591949", "0.7868222", "0.803419232", "0.820919514", 
    "0.826185584", "0.989573121", "22.58564949", "27.2142868", 
    "4.501103401", "4.745172024", "4.843928814", "4.979007244", 
    "9.076541901", "RQ Min"), class = "factor"), RQ.Max = structure(c(13L, 
    13L, 13L, 8L, 8L, 8L, 6L, 6L, 6L, 9L, 9L, 9L, 1L, 1L, 1L, 
    16L, 16L, 16L, 19L, 19L, 19L, 20L, 20L, 20L), .Label = c("", 
    "0.858568788", "0.910271943", "0.943540215", "0.947846115", 
    "0.962214947", "0.971821666", "1.062453985", "1.145578504", 
    "1.162549496", "1.218146205", "1.244680166", "1.347676158", 
    "14.63914394", "15.85231876", "18.10507202", "20.37916756", 
    "3.381742954", "50.08181381", "53.58541107", "64.28199768", 
    "65.58969879", "84.38751984", "RQ Max"), class = "factor"), 
    C_ = c(25.48042297, 25.4738903, 25.83390617, 25.7304306, 
    25.78297043, 25.41260529, 25.49670792, 25.52298164, 25.6956234, 
    25.34812355, 25.51462555, 25.15455437, 0, 0, 0, 32.29237366, 
    37.10370636, 32.22016525, 29.50172043, 30.18544579, 29.91492081, 
    25.14842796, 24.89806747, 24.99397278), C_.Mean = c(25.59607506, 
    25.59607506, 25.59607506, 25.64200401, 25.64200401, 25.64200401, 
    25.57177162, 25.57177162, 25.57177162, 25.33910179, 25.33910179, 
    25.33910179, NA, NA, NA, 33.87208176, 33.87208176, 33.87208176, 
    29.86736107, 29.86736107, 29.86736107, 25.01348877, 25.01348877, 
    25.01348877), C_.SD = structure(c(21L, 21L, 21L, 20L, 20L, 
    20L, 12L, 12L, 12L, 19L, 19L, 19L, 1L, 1L, 1L, 31L, 31L, 
    31L, 23L, 23L, 23L, 14L, 14L, 14L), .Label = c("", "0.039937571", 
    "0.043110434", "0.049541138", "0.05469643", "0.061177365", 
    "0.066671595", "0.07365533", "0.079849631", "0.082057081", 
    "0.095515646", "0.108060829", "0.120047837", "0.126316145", 
    "0.129658803", "0.130481929", "0.142733917", "0.172286868", 
    "0.180205062", "0.200392827", "0.205995336", "0.236968249", 
    "0.344334781", "0.36769405", "0.413046211", "0.445171326", 
    "0.514641941", "0.640576839", "0.895943522", "0.993181109", 
    "2.798901796", "C_ SD"), class = "factor"), `_C_` = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "_C_"), class = "factor"), 
    `_C_.Mean` = structure(c(8L, 8L, 8L, 5L, 5L, 5L, 4L, 4L, 
    4L, 7L, 7L, 7L, 1L, 1L, 1L, 3L, 3L, 3L, 13L, 13L, 13L, 14L, 
    14L, 14L), .Label = c("", "_C_ Mean", "-0.577166259", "-0.68969661", 
    "-0.720502198", "-0.776381195", "-0.85484314", "-0.96064502", 
    "-1.058534026", "-2.04822278", "-2.545912504", "-3.293611526", 
    "-4.921841145", "-6.081196308", "0.477069855", "1.373315215", 
    "2.092705965", "2.244637728", "2.251055479", "2.346632004", 
    "2.456220627", "2.557917356", "2.729323149", "2.746313095"
    ), class = "factor"), `_C_.SE` = structure(c(13L, 13L, 13L, 
    11L, 11L, 11L, 6L, 6L, 6L, 9L, 9L, 9L, 1L, 1L, 1L, 24L, 24L, 
    24L, 21L, 21L, 21L, 15L, 15L, 15L), .Label = c("", "_C_ SE", 
    "0.042180877", "0.042606823", "0.048373949", "0.077573851", 
    "0.088320434", "0.102536619", "0.108728357", "0.113733612", 
    "0.117972165", "0.144372106", "0.155044988", "0.223316222", 
    "0.224465802", "0.258952528", "0.300881863", "0.306413502", 
    "0.319273174", "0.579304695", "0.606897891", "0.635279417", 
    "0.682336032", "1.643036604"), class = "factor"), HK.Control._C_.Mean = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "HK Control _C_ Mean"
    ), class = "factor"), HK.Control._C_.SE = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "HK Control _C_ SE"
    ), class = "factor"), `__C_` = structure(c(12L, 12L, 12L, 
    16L, 16L, 16L, 18L, 18L, 18L, 13L, 13L, 13L, 1L, 1L, 1L, 
    19L, 19L, 19L, 7L, 7L, 7L, 10L, 10L, 10L), .Label = c("", 
    "__C_", "-0.871322632", "-1.61563623", "-3.021018982", "-3.15124011", 
    "-3.961196184", "-4.140928745", "-4.790550232", "-5.120551586", 
    "-5.38631773", "0", "0.105801903", "0.15834935", "0.211582825", 
    "0.240142822", "0.253925949", "0.27094841", "0.383478791", 
    "0.465211242", "0.484685272", "0.501675308"), class = "factor"), 
    Automatic.Ct.Threshold = structure(c(3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L), .Label = c("", "Automatic Ct Threshold", 
    "TRUE"), class = "factor"), Ct.Threshold = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "0.056211855", 
    "0.208910329", "0.693888608", "0.704941193", "Ct Threshold"
    ), class = "factor"), Automatic.Baseline = structure(c(3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "Automatic Baseline", 
    "TRUE"), class = "factor"), Baseline.Start = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "3", "Baseline Start"
    ), class = "factor"), Baseline.End = structure(c(3L, 3L, 
    4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 13L, 14L, 14L, 8L, 
    12L, 8L, 6L, 7L, 7L, 3L, 3L, 3L), .Label = c("", "21", "22", 
    "23", "25", "26", "27", "29", "30", "31", "32", "34", "35", 
    "39", "Baseline End"), class = "factor"), Efficiency = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "1", "Efficiency"
    ), class = "factor"), Comments = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "Comments"), class = "factor"), 
    HIGHSD = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L
    ), .Label = c("", "HIGHSD", "N", "Y"), class = "factor"), 
    NOAMP = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", 
    "N", "NOAMP", "Y"), class = "factor"), OUTLIERRG = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "N", "OUTLIERRG", 
    "Y"), class = "factor"), EXPFAIL = structure(c(3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "EXPFAIL", "N", "Y"
    ), class = "factor")), .Names = c("Well", "Sample.Name", 
"Target.Name", "Task", "Reporter", "Quencher", "RQ", "RQ.Min", 
"RQ.Max", "C_", "C_.Mean", "C_.SD", "_C_", "_C_.Mean", "_C_.SE", 
"HK.Control._C_.Mean", "HK.Control._C_.SE", "__C_", "Automatic.Ct.Threshold", 
"Ct.Threshold", "Automatic.Baseline", "Baseline.Start", "Baseline.End", 
"Efficiency", "Comments", "HIGHSD", "NOAMP", "OUTLIERRG", "EXPFAIL"
), row.names = c(12L, 13L, 14L, 24L, 25L, 26L, 36L, 37L, 38L, 
48L, 49L, 50L, 60L, 61L, 62L, 72L, 73L, 74L, 84L, 85L, 86L, 96L, 
97L, 98L), class = "data.frame")

code2 "output":

> dput(`c(8, 9, 10, 21, 22, 23, 33, 34, 35, 46, 47, 48, 58, 59, 60, 73, 74, 75, 85, 86, 87, 97, 98, 99)_slim`)
structure(list(Group.1 = c("Sample2_c", "Sample2_in", "Sample3_c", 
"Sample5_in", "Control_c", "Control_in", "Sample5_c", "Sample3_in"
), x = c(33.8720817566667, 25.6420021066667, 29.8673623433333, 
25.5717709866667, 0, 25.5960731466667, 25.0134894033333, 25.3391011566667
)), .Names = c("Group.1", "x"), row.names = c(NA, -8L), class = "data.frame")

由于给定的名称,我不知道这是否真的是输出。但预期的输出应该是具有正确名称的内容:STA_slim

感谢您的宝贵时间

首先,我强烈建议您避免在 R 代码中使用 assign()。最好使用 R 中众多 mapping/apply 函数之一在列表中构建相关数据。使用 get/assign 表示您没有以非常 R-like 的方式做事。

你的问题实际上与 dplyr 无关,这是你在循环中循环的问题。当你做

  for(i in mget(ls(pattern = "_target"))){
    assign(paste0(i, "_slim"),data.frame(i %>% group_by(Sample.Name) %>% summarise(Mean_dC=mean(C__))))
  }

i 不是 data.frame 的名称,因为您 mget() 它是数据框本身。将其粘贴到新名称中没有意义。

为了"fix"这个,你可以做到

for(i in ls(pattern = "_target")){
  assign(paste0(i, "_slim"),data.frame(get(i) %>% group_by(Sample.Name) %>% summarise(Mean_dC=mean(C__))))
}

但即便如此,您的示例数据集中也没有名为 C__ 的列。你有 C__C___C_(这些名字到底是什么意思??)。所以你需要解决这个问题。

更好的列表方式是

slim <- lapply(mget(ls(pattern = "_target$")) , function(x) {
  x %>% group_by(Sample.Name) %>% summarise(Mean_dC=mean(C_))
})