使用 R 中的特定行限制计算失败率
Calculate the failure rate using a specific row limit in R
我有一个这样的数据框
ID <- c("ID300","ID301","ID302","ID303","ID304","ID305","ID306","ID307","ID308","ID309")
Measurement <- c("Length","Length","Length","Length","Length","Length","Length","Length","Length","Length")
PASSFAIL <- c("FAIL","PASS","FAIL","FAIL#Pts","PASS","PASS","PASS","PASS","PASS","FAIL")
df1 <- data.frame(ID,Measurement,PASSFAIL)
第一部分
我正在尝试创建一个为每个 ID 计算的故障率列。我尝试计算的方法是使用 5 个 ID 的 window。例如
Fail Rate = (Number of Fails)/(Number of Fails + Number of Pass)
ID300 <- (Fails of Row1 to Row5)/(Total from Row1 to Row5) = (3/5) = 0.6
注意:在 df1 中,任何在 PASSFAIL 列中有 FAIL 的都被认为是失败的。
如果 window 大小小于 5,它也应该 return NA,因此我想要的输出看起来像这样
ID Measurement PASSFAIL FR
1 ID300 Length FAIL 0.6
2 ID301 Length PASS 0.4
3 ID302 Length FAIL 0.4
4 ID303 Length FAIL#Pts 0.2
5 ID304 Length PASS 0.0
6 ID305 Length PASS 0.2
7 ID306 Length PASS NA
8 ID307 Length PASS NA
9 ID308 Length PASS NA
10 ID309 Length FAIL NA
第二部分
完成后,我需要重新计算添加的每个新 ID 的失败率,考虑相同的 window 5。例如,我想要的输出是
ID Measurement PASSFAIL FR
1 ID296 Length PASS 0.4
2 ID297 Length FAIL 0.6
3 ID298 Length PASS 0.6
4 ID299 Length FAIL 0.6
5 ID300 Length FAIL 0.8
6 ID301 Length FAIL 0.6
7 ID302 Length PASS NA
8 ID303 Length FAIL NA
9 ID304 Length FAIL#Pts NA
10 ID305 Length PASS NA
我目前正在通过执行类似的操作来计算故障率,它会针对整个数据帧进行计算。考虑到 window 大小为 5,我不知道如何使用循环按顺序计算每个 ID。
setDT(df1)
# aggregate
df1 <- df1[, .( FR = (sum(PASSFAIL != "PASS")/.N))]
请提供一些意见。
您可能想尝试 sapply 函数,另外为了良好的秩序,我会声明没有因素的 df1。
df1 <- data.frame(ID,Measurement,PASSFAIL,stringsAsFactors = FALSE)
df1$FR <- sapply(df1$ID,FUN = function(x) {
if(which(df1$ID == x) > nrow(df1)-4){
return(NA_real_)
}else{
start_ID <- which(df1$ID == x)
end_ID <- start_ID + 4
return(sum(grepl("FAIL",df1[start_ID:end_ID,"PASSFAIL"]))/5)
}
})
我看不懂你的第 2 部分,但这里的第 1 部分使用 stats::filter
和 grepl
调用来搜索包含 "FAIL"
:
的所有值
df1$FR <- NA
vals <- na.omit(filter(grepl("FAIL",df1$PASSFAIL), rep(1,5)/5, sides=1))
df1$FR[seq(1,length(vals))] <- vals
df1
# ID Measurement PASSFAIL FR
#1 ID300 Length FAIL 0.6
#2 ID301 Length PASS 0.4
#3 ID302 Length FAIL 0.4
#4 ID303 Length FAIL#Pts 0.2
#5 ID304 Length PASS 0.0
#6 ID305 Length PASS 0.2
#7 ID306 Length PASS NA
#8 ID307 Length PASS NA
#9 ID308 Length PASS NA
#10 ID309 Length FAIL NA
或者:
rev(filter(grepl("FAIL",rev(df1$PASSFAIL)), rep(1,5)/5, sides=1))
如果你想变漂亮
我有一个这样的数据框
ID <- c("ID300","ID301","ID302","ID303","ID304","ID305","ID306","ID307","ID308","ID309")
Measurement <- c("Length","Length","Length","Length","Length","Length","Length","Length","Length","Length")
PASSFAIL <- c("FAIL","PASS","FAIL","FAIL#Pts","PASS","PASS","PASS","PASS","PASS","FAIL")
df1 <- data.frame(ID,Measurement,PASSFAIL)
第一部分 我正在尝试创建一个为每个 ID 计算的故障率列。我尝试计算的方法是使用 5 个 ID 的 window。例如
Fail Rate = (Number of Fails)/(Number of Fails + Number of Pass)
ID300 <- (Fails of Row1 to Row5)/(Total from Row1 to Row5) = (3/5) = 0.6
注意:在 df1 中,任何在 PASSFAIL 列中有 FAIL 的都被认为是失败的。
如果 window 大小小于 5,它也应该 return NA,因此我想要的输出看起来像这样
ID Measurement PASSFAIL FR
1 ID300 Length FAIL 0.6
2 ID301 Length PASS 0.4
3 ID302 Length FAIL 0.4
4 ID303 Length FAIL#Pts 0.2
5 ID304 Length PASS 0.0
6 ID305 Length PASS 0.2
7 ID306 Length PASS NA
8 ID307 Length PASS NA
9 ID308 Length PASS NA
10 ID309 Length FAIL NA
第二部分 完成后,我需要重新计算添加的每个新 ID 的失败率,考虑相同的 window 5。例如,我想要的输出是
ID Measurement PASSFAIL FR
1 ID296 Length PASS 0.4
2 ID297 Length FAIL 0.6
3 ID298 Length PASS 0.6
4 ID299 Length FAIL 0.6
5 ID300 Length FAIL 0.8
6 ID301 Length FAIL 0.6
7 ID302 Length PASS NA
8 ID303 Length FAIL NA
9 ID304 Length FAIL#Pts NA
10 ID305 Length PASS NA
我目前正在通过执行类似的操作来计算故障率,它会针对整个数据帧进行计算。考虑到 window 大小为 5,我不知道如何使用循环按顺序计算每个 ID。
setDT(df1)
# aggregate
df1 <- df1[, .( FR = (sum(PASSFAIL != "PASS")/.N))]
请提供一些意见。
您可能想尝试 sapply 函数,另外为了良好的秩序,我会声明没有因素的 df1。
df1 <- data.frame(ID,Measurement,PASSFAIL,stringsAsFactors = FALSE)
df1$FR <- sapply(df1$ID,FUN = function(x) {
if(which(df1$ID == x) > nrow(df1)-4){
return(NA_real_)
}else{
start_ID <- which(df1$ID == x)
end_ID <- start_ID + 4
return(sum(grepl("FAIL",df1[start_ID:end_ID,"PASSFAIL"]))/5)
}
})
我看不懂你的第 2 部分,但这里的第 1 部分使用 stats::filter
和 grepl
调用来搜索包含 "FAIL"
:
df1$FR <- NA
vals <- na.omit(filter(grepl("FAIL",df1$PASSFAIL), rep(1,5)/5, sides=1))
df1$FR[seq(1,length(vals))] <- vals
df1
# ID Measurement PASSFAIL FR
#1 ID300 Length FAIL 0.6
#2 ID301 Length PASS 0.4
#3 ID302 Length FAIL 0.4
#4 ID303 Length FAIL#Pts 0.2
#5 ID304 Length PASS 0.0
#6 ID305 Length PASS 0.2
#7 ID306 Length PASS NA
#8 ID307 Length PASS NA
#9 ID308 Length PASS NA
#10 ID309 Length FAIL NA
或者:
rev(filter(grepl("FAIL",rev(df1$PASSFAIL)), rep(1,5)/5, sides=1))
如果你想变漂亮