清理数据集,根据不同的列和不同的值将行设置为 NA
Clean a dataset, set rows to NA based on different columns and different values
我的数据集如下所示:
我想清理它,以便在 "QR" 显示 C:
时所有行都是 NA
SO4 PO4 LabConductivity LabPH Notes QR
1 0.131 0.00100 3.98 5.25 dmz B
2 0.109 0.00126 3.54 5.27 mz B
3 0.219 -0.5656 6.28 5.23 <NA> A
4 0.219 -0.5656 6.28 -5.66 <NA> C
5 0.219 -0.5656 6.28 5.23 <NA> C
所以我可以这样做:
mydata[mydata$QR=="C",] <- NA
但是,我想对其他变量继续这样做,例如当 LabPH >6 或 <0 时,将整行设置为 NA。
如果我再次做同样的事情,我会收到以下警告:
Error in `[<-.data.frame`(`*tmp*`, mydata$LabPH > 5 | mydata$LabPH < 0, : missing values are not allowed in subscripted assignments of data frames
还有其他方法吗?这种情况下是否有 ignoreNAfunction?
或者有更好的方法吗?
在此先感谢
干杯
桑德拉
您只需将 which
添加到您的逻辑测试中即可。
例如,
mydata[which(mydata$LabPh > 5.25),] <- NA
如果您在用于进行逻辑测试的列中有 NA
,则 data.frame
无法进行子集化。例如,您可以看到带有 LabPH = NA
的行未被子集化。
> mydata[mydata$LabPH > 5.25,]
SO4 PO4 LabConductivity LabPH Notes QR
2 0.109 0.00126 3.54 5.27 mz B
NA NA NA NA NA <NA> <NA>
NA.1 NA NA NA NA <NA> <NA>
which
有效,因为它排除了带有 LabPH = NA
的那些行,另一种方法是使用 !is.na()
排除 NA
> new <- mydata[!is.na(mydata$LabPH)&mydata$LabPH > 5.25,]
> new
SO4 PO4 LabConductivity LabPH Notes QR
2 0.109 0.00126 3.54 5.27 mz B
用NA
替换整行不是和排除数据一样好吗?如果是这样,考虑到您的条件 (QR = "C" and LabPH = between 0 to 6)
,这里有一种方法...
# Please note I added a random 6th row with LabPH = 7.0.
SO4 = c(0.131,0.109,0.219,0.219,0.219,0.21)
PO4 = c(0.00100,0.00126,-0.5656,-0.5656,-0.5656,-0.532)
LabConductivity = c(3.98, 3.54, 6.28, 6.28, 6.28,6.25)
LabPH = c(5.25,5.27,5.23,-5.66,5.23,7.0)
Notes = c("dmz","mz","<NA>","<NA>","<NA>","mz")
QR = c("B","B","A","C","C","B")
# create a data frame
df = data.frame(SO4,PO4,LabConductivity,LabPH,Notes,QR)
df
SO4 PO4 LabConductivity LabPH Notes QR
1 0.131 0.00100 3.98 5.25 dmz B
2 0.109 0.00126 3.54 5.27 mz B
3 0.219 -0.56560 6.28 5.23 <NA> A
4 0.219 -0.56560 6.28 -5.66 <NA> C
5 0.219 -0.56560 6.28 5.23 <NA> C
6 0.210 -0.53200 6.25 7.00 mz B
# 根据您的条件进行子集
df[which((df$LabPH > 0 & df$LabPH < 6) & df$QR != "C"),]
# output
SO4 PO4 LabConductivity LabPH Notes QR
1 0.131 0.00100 3.98 5.25 dmz B
2 0.109 0.00126 3.54 5.27 mz B
3 0.219 -0.56560 6.28 5.23 <NA> A
我的数据集如下所示:
我想清理它,以便在 "QR" 显示 C:
时所有行都是 NA SO4 PO4 LabConductivity LabPH Notes QR
1 0.131 0.00100 3.98 5.25 dmz B
2 0.109 0.00126 3.54 5.27 mz B
3 0.219 -0.5656 6.28 5.23 <NA> A
4 0.219 -0.5656 6.28 -5.66 <NA> C
5 0.219 -0.5656 6.28 5.23 <NA> C
所以我可以这样做:
mydata[mydata$QR=="C",] <- NA
但是,我想对其他变量继续这样做,例如当 LabPH >6 或 <0 时,将整行设置为 NA。
如果我再次做同样的事情,我会收到以下警告:
Error in `[<-.data.frame`(`*tmp*`, mydata$LabPH > 5 | mydata$LabPH < 0, : missing values are not allowed in subscripted assignments of data frames
还有其他方法吗?这种情况下是否有 ignoreNAfunction? 或者有更好的方法吗?
在此先感谢 干杯 桑德拉
您只需将 which
添加到您的逻辑测试中即可。
例如,
mydata[which(mydata$LabPh > 5.25),] <- NA
NA
,则 data.frame
无法进行子集化。例如,您可以看到带有 LabPH = NA
的行未被子集化。
> mydata[mydata$LabPH > 5.25,]
SO4 PO4 LabConductivity LabPH Notes QR
2 0.109 0.00126 3.54 5.27 mz B
NA NA NA NA NA <NA> <NA>
NA.1 NA NA NA NA <NA> <NA>
which
有效,因为它排除了带有 LabPH = NA
的那些行,另一种方法是使用 !is.na()
排除 NA
> new <- mydata[!is.na(mydata$LabPH)&mydata$LabPH > 5.25,]
> new
SO4 PO4 LabConductivity LabPH Notes QR
2 0.109 0.00126 3.54 5.27 mz B
用NA
替换整行不是和排除数据一样好吗?如果是这样,考虑到您的条件 (QR = "C" and LabPH = between 0 to 6)
,这里有一种方法...
# Please note I added a random 6th row with LabPH = 7.0.
SO4 = c(0.131,0.109,0.219,0.219,0.219,0.21)
PO4 = c(0.00100,0.00126,-0.5656,-0.5656,-0.5656,-0.532)
LabConductivity = c(3.98, 3.54, 6.28, 6.28, 6.28,6.25)
LabPH = c(5.25,5.27,5.23,-5.66,5.23,7.0)
Notes = c("dmz","mz","<NA>","<NA>","<NA>","mz")
QR = c("B","B","A","C","C","B")
# create a data frame
df = data.frame(SO4,PO4,LabConductivity,LabPH,Notes,QR)
df
SO4 PO4 LabConductivity LabPH Notes QR
1 0.131 0.00100 3.98 5.25 dmz B
2 0.109 0.00126 3.54 5.27 mz B
3 0.219 -0.56560 6.28 5.23 <NA> A
4 0.219 -0.56560 6.28 -5.66 <NA> C
5 0.219 -0.56560 6.28 5.23 <NA> C
6 0.210 -0.53200 6.25 7.00 mz B
# 根据您的条件进行子集
df[which((df$LabPH > 0 & df$LabPH < 6) & df$QR != "C"),]
# output
SO4 PO4 LabConductivity LabPH Notes QR
1 0.131 0.00100 3.98 5.25 dmz B
2 0.109 0.00126 3.54 5.27 mz B
3 0.219 -0.56560 6.28 5.23 <NA> A