清理数据集,根据不同的列和不同的值将行设置为 NA

Clean a dataset, set rows to NA based on different columns and different values

我的数据集如下所示:

我想清理它,以便在 "QR" 显示 C:

时所有行都是 NA
    SO4     PO4 LabConductivity  LabPH Notes   QR
1 0.131 0.00100            3.98   5.25   dmz    B
2 0.109 0.00126            3.54   5.27    mz    B
3 0.219 -0.5656            6.28   5.23  <NA>    A
4 0.219 -0.5656            6.28  -5.66  <NA>    C
5 0.219 -0.5656            6.28   5.23  <NA>    C

所以我可以这样做:

mydata[mydata$QR=="C",] <- NA

但是,我想对其他变量继续这样做,例如当 LabPH >6 或 <0 时,将整行设置为 NA。

如果我再次做同样的事情,我会收到以下警告:

Error in `[<-.data.frame`(`*tmp*`, mydata$LabPH > 5 | mydata$LabPH < 0,  : missing values are not allowed in subscripted assignments of data frames

还有其他方法吗?这种情况下是否有 ignoreNAfunction? 或者有更好的方法吗?

在此先感谢 干杯 桑德拉

您只需将 which 添加到您的逻辑测试中即可。

例如,

mydata[which(mydata$LabPh > 5.25),] <- NA

如果您在用于进行逻辑测试的列中有 NA,则

data.frame 无法进行子集化。例如,您可以看到带有 LabPH = NA 的行未被子集化。

> mydata[mydata$LabPH > 5.25,]
   SO4     PO4 LabConductivity LabPH Notes   QR
   2    0.109 0.00126            3.54  5.27    mz    B
   NA      NA      NA              NA    NA  <NA> <NA>
   NA.1    NA      NA              NA    NA  <NA> <NA>

which 有效,因为它排除了带有 LabPH = NA 的那些行,另一种方法是使用 !is.na() 排除 NA

> new <- mydata[!is.na(mydata$LabPH)&mydata$LabPH > 5.25,]
> new
    SO4     PO4 LabConductivity LabPH Notes QR
2 0.109 0.00126            3.54  5.27    mz  B

NA替换整行不是和排除数据一样好吗?如果是这样,考虑到您的条件 (QR = "C" and LabPH = between 0 to 6),这里有一种方法...

# Please note I added a random 6th row with LabPH = 7.0. 

SO4 = c(0.131,0.109,0.219,0.219,0.219,0.21)
PO4 = c(0.00100,0.00126,-0.5656,-0.5656,-0.5656,-0.532)
LabConductivity = c(3.98, 3.54, 6.28, 6.28, 6.28,6.25)
LabPH = c(5.25,5.27,5.23,-5.66,5.23,7.0)
Notes = c("dmz","mz","<NA>","<NA>","<NA>","mz")
QR = c("B","B","A","C","C","B")

# create a data frame
df = data.frame(SO4,PO4,LabConductivity,LabPH,Notes,QR)
df

    SO4      PO4 LabConductivity LabPH Notes QR
1 0.131  0.00100            3.98  5.25   dmz  B
2 0.109  0.00126            3.54  5.27    mz  B
3 0.219 -0.56560            6.28  5.23  <NA>  A
4 0.219 -0.56560            6.28 -5.66  <NA>  C
5 0.219 -0.56560            6.28  5.23  <NA>  C
6 0.210 -0.53200            6.25  7.00    mz  B

# 根据您的条件进行子集

df[which((df$LabPH > 0 & df$LabPH < 6) & df$QR != "C"),]
# output
   SO4     PO4    LabConductivity LabPH Notes QR
1 0.131  0.00100            3.98  5.25   dmz  B
2 0.109  0.00126            3.54  5.27    mz  B
3 0.219 -0.56560            6.28  5.23  <NA>  A