R检查并计算向量中的字符串,group_by,考虑字符串的出现顺序
R check and count Strings in a vector, group_by, considering order of appearance of the strings
数据采用以下格式,我必须 group_by 使用日期。为方便起见,我将其显示为数字。
Msg <- c("Errors","Errors", "Start","Stop","Start","Stop","Errors","Errors","Start","Stop",
"Stop" ,"Start","Errors","Start","Stop","Start" ,"Stop" ,
"Errors", "Start","Errors","Stop", "Start", "LostControl","LostControl", "Errors",
"Failed", "Stop", "Start","Failed","Stop","Stop","Start","Stop","Start","Error","Start",
"Failed", "Stop")
Date <- c(11,11,11,11,11,11,11,12,12,12,12,12,12,14,14,14,14, 19,19,19,19,
20,20,20,20,20,20,21,21,21,21,22,22,22,22,22,22,22)
data<- data.frame(Msg,Date)
我需要统计每个 START-STOP 循环中 Failed 的次数,按日期汇总。
数据具有三种类型的消息。
Errors 和 Failed 是两种类型的失败消息,而 LostControl 不是失败。
条件是 Failed 消息在该 START-STOP 周期中不应在 LostControl 消息之前。如果前面只有 Errors,则为 Failure。
此外,如果仅发现“错误”消息,也不会被视为失败。
Edit: 在Msg向量中,一个START_STOP循环是从极端开始到极端停止当且仅当两个开始或找到停靠点。如果 START 后面没有 STOP,则会被忽略。
编辑 一行添加为 - (Msg =Stop, Date=20)
我们可以修改我昨天在您中写的那个函数。
between_valid_anchors <- function(x, bgn = "Start", end = "Stop") {
are_anchors <- x %in% c(bgn, end)
xid <- seq_along(x)
id <- xid[are_anchors]
x <- x[are_anchors]
start_pos <- id[which(x == bgn & c("", head(x, -1L)) %in% c("", end))]
stop_pos <- id[which(x == end & c(tail(x, -1L), "") %in% c("", bgn))]
if (length(start_pos) < 1L || length(stop_pos) < 1L)
return(logical(length(xid)))
xid %in% unlist(mapply(`:`, start_pos, stop_pos))
}
然后
library(dplyr)
data %>%
group_by(Date) %>%
filter(between_valid_anchors(Msg)) %>%
summarise(Msg = sum(Msg %in% c("Err", "Errors", "Failed")))
输出
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 2
Date Msg
<dbl> <int>
1 11 0
2 12 0
3 14 0
4 19 1
5 21 1
6 22 2
更新
您可以再添加一个过滤器以 select 只有感兴趣的消息(即开始、停止、失败、LostControl)。然后,只求和所有 Msg == "Failed"
但不求 lag(Msg) == "LostControl"
library(dplyr)
data %>%
group_by(Date) %>%
filter(between_valid_anchors(Msg)) %>%
filter(Msg %in% c("Start", "Stop", "Failed", "LostControl")) %>%
summarise(Msg = sum(Msg == "Failed" & lag(Msg, default = "") != "LostControl"))
输出
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 7 x 2
Date Msg
<dbl> <int>
1 11 0
2 12 0
3 14 0
4 19 0
5 20 0
6 21 1
7 22 1
数据采用以下格式,我必须 group_by 使用日期。为方便起见,我将其显示为数字。
Msg <- c("Errors","Errors", "Start","Stop","Start","Stop","Errors","Errors","Start","Stop",
"Stop" ,"Start","Errors","Start","Stop","Start" ,"Stop" ,
"Errors", "Start","Errors","Stop", "Start", "LostControl","LostControl", "Errors",
"Failed", "Stop", "Start","Failed","Stop","Stop","Start","Stop","Start","Error","Start",
"Failed", "Stop")
Date <- c(11,11,11,11,11,11,11,12,12,12,12,12,12,14,14,14,14, 19,19,19,19,
20,20,20,20,20,20,21,21,21,21,22,22,22,22,22,22,22)
data<- data.frame(Msg,Date)
我需要统计每个 START-STOP 循环中 Failed 的次数,按日期汇总。
数据具有三种类型的消息。
Errors 和 Failed 是两种类型的失败消息,而 LostControl 不是失败。
条件是 Failed 消息在该 START-STOP 周期中不应在 LostControl 消息之前。如果前面只有 Errors,则为 Failure。
此外,如果仅发现“错误”消息,也不会被视为失败。
Edit: 在Msg向量中,一个START_STOP循环是从极端开始到极端停止当且仅当两个开始或找到停靠点。如果 START 后面没有 STOP,则会被忽略。
编辑 一行添加为 - (Msg =Stop, Date=20)
我们可以修改我昨天在您
between_valid_anchors <- function(x, bgn = "Start", end = "Stop") {
are_anchors <- x %in% c(bgn, end)
xid <- seq_along(x)
id <- xid[are_anchors]
x <- x[are_anchors]
start_pos <- id[which(x == bgn & c("", head(x, -1L)) %in% c("", end))]
stop_pos <- id[which(x == end & c(tail(x, -1L), "") %in% c("", bgn))]
if (length(start_pos) < 1L || length(stop_pos) < 1L)
return(logical(length(xid)))
xid %in% unlist(mapply(`:`, start_pos, stop_pos))
}
然后
library(dplyr)
data %>%
group_by(Date) %>%
filter(between_valid_anchors(Msg)) %>%
summarise(Msg = sum(Msg %in% c("Err", "Errors", "Failed")))
输出
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 2
Date Msg
<dbl> <int>
1 11 0
2 12 0
3 14 0
4 19 1
5 21 1
6 22 2
更新
您可以再添加一个过滤器以 select 只有感兴趣的消息(即开始、停止、失败、LostControl)。然后,只求和所有 Msg == "Failed"
但不求 lag(Msg) == "LostControl"
library(dplyr)
data %>%
group_by(Date) %>%
filter(between_valid_anchors(Msg)) %>%
filter(Msg %in% c("Start", "Stop", "Failed", "LostControl")) %>%
summarise(Msg = sum(Msg == "Failed" & lag(Msg, default = "") != "LostControl"))
输出
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 7 x 2
Date Msg
<dbl> <int>
1 11 0
2 12 0
3 14 0
4 19 0
5 20 0
6 21 1
7 22 1