在数据框中查找没有先决条件记录的记录
Find records without a prerequisite record in the dataframe
我有一个包含 3 列的数据框:Timestamp
、MMR_NBR
和 Action
对于所有 MMR_NBR 个实例,操作 DFV
必须在 SAP Load
之前发生。我想提取之前没有发生 DFV
操作的 SAP Load
实例。我在 R
中使用 sqldf
,并且我知道 R
使用 SQLite
数据库语言,因此 window 功能受到限制。我设法获得了记录,但我想看看是否有更简单更好的方法来使用 a SQL
查询或任何 R
包(例如 dplyr
)来编写此记录。
示例数据:
df5 <- structure(list(Timestamp = structure(c(7L, 8L, 9L, 10L, 11L,
1L, 2L, 3L, 4L, 5L, 6L, 12L, 13L, 16L, 17L, 18L, 14L, 15L, 19L,
20L), .Label = c("8/14/2018 11:22:18 AM", "8/14/2018 11:30:03 AM",
"8/14/2018 11:32:26 AM", "8/14/2018 4:03:27 PM", "8/14/2018 4:04:05 PM",
"8/14/2018 4:04:11 PM", "8/20/2018 4:02:00 PM", "8/20/2018 6:12:50 PM",
"8/21/2018 9:56:51 AM", "8/21/2018 9:56:59 AM", "8/22/2018 10:43:45 AM",
"8/22/2018 10:43:57 AM", "8/22/2018 4:34:53 PM", "8/23/2018 1:53:25 PM",
"8/23/2018 1:53:36 PM", "8/23/2018 11:47:15 AM", "8/23/2018 12:23:44 PM",
"8/23/2018 12:26:20 PM", "8/23/2018 2:38:59 PM", "8/23/2018 2:39:19 PM"
), class = "factor"), MMR_NBR = structure(c(12L, 10L, 2L, 2L,
8L, 11L, 5L, 5L, 7L, 7L, 7L, 8L, 9L, 3L, 4L, 4L, 1L, 1L, 6L,
6L), .Label = c("B00215", "B00216", "B00218", "B00219", "K00364",
"K00625", "K00632", "K00642", "K00646", "W00362", "W00364", "W00365"
), class = "factor"), Action = structure(c(1L, 1L, 1L, 2L, 1L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("DFV",
"SAP Load"), class = "factor")), .Names = c("Timestamp", "MMR_NBR",
"Action"), row.names = c(NA, 20L), class = "data.frame")
上面的示例数据8/14/2018 11:22:18 AM W00364 SAP Load
必须连同类似的记录作为查询结果返回。
R 脚本:
sql="SELECT DISTINCT Timestamp, MMR_NBR, Action FROM df5 WHERE (Action='DFV' OR Action='SAP Load') AND MMR_NBR<>''"
df5 <- sqldf::sqldf(sql)
sql="SELECT MMR_NBR,Action, COUNT(*) FROM df5 GROUP BY MMR_NBR HAVING COUNT(*)=1"
df6 <- sqldf::sqldf(sql)
使用dplyr
:
第 1 步:将时间戳转换为实际时间戳:
df5$Timestamp<- as.POSIXct(as.character(df5$Timestamp), format="%m/%d/%Y %I:%M:%S %p")
第 2 步:
require(dplyr)
df5 %>% group_by(MMR_NBR) %>%
arrange(Timestamp) %>% # Order by time
filter(Action=="SAP Load" & cumsum(Action=="DFV")==0) # Extract those cases where Action is "SAP Load" and the total of previous rows where Action was "DFV" is zero
结果:
# A tibble: 5 x 3
# Groups: MMR_NBR [4]
Timestamp MMR_NBR Action
<dttm> <fct> <fct>
1 2018-08-14 11:22:18 W00364 SAP Load
2 2018-08-14 11:30:03 K00364 SAP Load
3 2018-08-14 11:32:26 K00364 SAP Load
4 2018-08-22 16:34:53 K00646 SAP Load
5 2018-08-23 11:47:15 B00218 SAP Load
我有一个包含 3 列的数据框:Timestamp
、MMR_NBR
和 Action
对于所有 MMR_NBR 个实例,操作 DFV
必须在 SAP Load
之前发生。我想提取之前没有发生 DFV
操作的 SAP Load
实例。我在 R
中使用 sqldf
,并且我知道 R
使用 SQLite
数据库语言,因此 window 功能受到限制。我设法获得了记录,但我想看看是否有更简单更好的方法来使用 a SQL
查询或任何 R
包(例如 dplyr
)来编写此记录。
示例数据:
df5 <- structure(list(Timestamp = structure(c(7L, 8L, 9L, 10L, 11L,
1L, 2L, 3L, 4L, 5L, 6L, 12L, 13L, 16L, 17L, 18L, 14L, 15L, 19L,
20L), .Label = c("8/14/2018 11:22:18 AM", "8/14/2018 11:30:03 AM",
"8/14/2018 11:32:26 AM", "8/14/2018 4:03:27 PM", "8/14/2018 4:04:05 PM",
"8/14/2018 4:04:11 PM", "8/20/2018 4:02:00 PM", "8/20/2018 6:12:50 PM",
"8/21/2018 9:56:51 AM", "8/21/2018 9:56:59 AM", "8/22/2018 10:43:45 AM",
"8/22/2018 10:43:57 AM", "8/22/2018 4:34:53 PM", "8/23/2018 1:53:25 PM",
"8/23/2018 1:53:36 PM", "8/23/2018 11:47:15 AM", "8/23/2018 12:23:44 PM",
"8/23/2018 12:26:20 PM", "8/23/2018 2:38:59 PM", "8/23/2018 2:39:19 PM"
), class = "factor"), MMR_NBR = structure(c(12L, 10L, 2L, 2L,
8L, 11L, 5L, 5L, 7L, 7L, 7L, 8L, 9L, 3L, 4L, 4L, 1L, 1L, 6L,
6L), .Label = c("B00215", "B00216", "B00218", "B00219", "K00364",
"K00625", "K00632", "K00642", "K00646", "W00362", "W00364", "W00365"
), class = "factor"), Action = structure(c(1L, 1L, 1L, 2L, 1L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("DFV",
"SAP Load"), class = "factor")), .Names = c("Timestamp", "MMR_NBR",
"Action"), row.names = c(NA, 20L), class = "data.frame")
上面的示例数据8/14/2018 11:22:18 AM W00364 SAP Load
必须连同类似的记录作为查询结果返回。
R 脚本:
sql="SELECT DISTINCT Timestamp, MMR_NBR, Action FROM df5 WHERE (Action='DFV' OR Action='SAP Load') AND MMR_NBR<>''"
df5 <- sqldf::sqldf(sql)
sql="SELECT MMR_NBR,Action, COUNT(*) FROM df5 GROUP BY MMR_NBR HAVING COUNT(*)=1"
df6 <- sqldf::sqldf(sql)
使用dplyr
:
第 1 步:将时间戳转换为实际时间戳:
df5$Timestamp<- as.POSIXct(as.character(df5$Timestamp), format="%m/%d/%Y %I:%M:%S %p")
第 2 步:
require(dplyr)
df5 %>% group_by(MMR_NBR) %>%
arrange(Timestamp) %>% # Order by time
filter(Action=="SAP Load" & cumsum(Action=="DFV")==0) # Extract those cases where Action is "SAP Load" and the total of previous rows where Action was "DFV" is zero
结果:
# A tibble: 5 x 3
# Groups: MMR_NBR [4]
Timestamp MMR_NBR Action
<dttm> <fct> <fct>
1 2018-08-14 11:22:18 W00364 SAP Load
2 2018-08-14 11:30:03 K00364 SAP Load
3 2018-08-14 11:32:26 K00364 SAP Load
4 2018-08-22 16:34:53 K00646 SAP Load
5 2018-08-23 11:47:15 B00218 SAP Load