在子集内操作数据是使用子集外的数据（？？）

Question

我很困惑为什么我的代码（明确定义了一个数据子集）在计算时使用该子集之外的一行。这是一个例子。我的数据全部来自同一批次（“batch1”），但我想仅使用它遇到的第一个“B”之前的第一组行来计算 NPOC。然后通过从“B”行的 resultC10 值中减去该行的 resultC10 值来计算 NPOC。所以 F 行 - B 行，然后是 U 行 - B 行。所有数据都来自同一批次，但因为我定义的是一个子集，为什么它甚至知道该批次其余部分的数据？

dat <- data.frame(sample_ind=c("F","X","B","F","X","B"),
                  resultC=c(7.31,3.12,.79,7.38,2.28,.59),
                  batch=c('batch1','batch1','batch1','batch1','batch1','batch1'))
dat$resultC10=dat$resultC*10
dat$NPOC <- NA
         
start_row = 1

for (i in nrow(dat)) {
  if (dat[i,1]=='B') {
    dat[start_row:i,] <- dat[start_row:i,] %>%
      group_by(batch) %>%
      mutate(NPOC = resultC10-resultC10[sample_ind=='B']) %>%
      ungroup
    start_row = i+1
  }
}

这是我得到的结果：

  sample_ind resultC  batch resultC10 NPOC
1          F    7.31 batch1      73.1 65.2  **NPOC OK-using row 3 (73.1-7.9)
2          X    3.12 batch1      31.2 25.3  **NPOC should be 23.3; it's using row 6 (31.2-5.9)
3          B    0.79 batch1       7.9  0.0  
4          F    7.38 batch1      73.8 67.9  **OK-using row 6
5          X    2.28 batch1      22.8 14.9  **should be 16.9; it's using row 3
6          B    0.59 batch1       5.9  0.0

非常感谢任何帮助。

Answer 1

您无需使用 for-loops:

即可实现此目的

dat <- data.frame(sample_ind=c("F","X","B","F","X","B"),
                  resultC=c(7.31,3.12,.79,7.38,2.28,.59),
                  batch=c('batch1','batch1','batch1','batch1','batch1','batch1'))
dat$resultC10=dat$resultC*10

dat %>%
  group_by(lag(cumsum(sample_ind == "B"), default = 0)) %>%
  mutate(NPOC = resultC10-last(resultC10)) %>%
  ungroup() %>%
  select(-`lag(cumsum(sample_ind == "B"), default = 0)`)

在子集内操作数据是使用子集外的数据（？？）

Manipulating data within a subset is using data outside the subset(??)

for-loop

r

subset