根据日期按组比较值并创建值下降的新数据框
Comparing a value by group based on date and creating new dataframe where value dropped
我有一个 df
+-----+------+--------+--------------------+------+---------+
| ID1 | ID2 | DOC_NO | DATE | COST | CLIENT |
+-----+------+--------+--------------------+------+---------+
| ABC | A123 | 1 | 2021-01-01 0:10:00 | 11 | ABC123 |
| DEF | B456 | 2 | 2021-01-01 0:10:00 | 12 | DEF256 |
| GHI | C789 | 3 | 2021-01-01 0:10:00 | 13 | GHI389 |
| JKL | D890 | 4 | 2021-01-01 0:10:00 | 14 | JKL490 |
| MNO | E012 | 5 | 2021-01-01 0:10:00 | 15 | MNO512 |
| ABC | A123 | 6 | 2021-01-01 0:15:00 | 11 | ABC623 |
| DEF | B456 | 7 | 2021-01-01 0:15:00 | 12 | DEF756 |
| GHI | C789 | 8 | 2021-01-01 0:15:00 | 13 | GHI889 |
| JKL | D890 | 9 | 2021-01-02 0:15:00 | 14 | JKL990 |
| MNO | E012 | 10 | 2021-01-03 0:15:00 | 15 | MNO1012 |
| ABC | A123 | 11 | 2021-01-03 0:20:00 | 10 | GHI890 |
| DEF | B456 | 12 | 2021-01-03 0:20:00 | 11 | JKL991 |
| GHI | C789 | 13 | 2021-01-03 0:20:00 | 12 | MNO1013 |
| JKL | D890 | 14 | 2021-01-03 0:20:00 | 13 | GHI891 |
| MNO | E012 | 15 | 2021-01-03 0:20:00 | 14 | JKL992 |
| ABC | A123 | 16 | 2021-01-03 0:20:00 | 12 | MNO1014 |
| DEF | B456 | 17 | 2021-01-03 0:20:00 | 13 | GHI892 |
| GHI | C789 | 18 | 2021-01-03 0:20:00 | 14 | JKL993 |
| JKL | D890 | 19 | 2021-01-03 0:20:00 | 15 | MNO1015 |
| MNO | E012 | 20 | 2021-01-03 0:20:00 | 16 | GHI893 |
| ABC | A123 | 21 | 2021-01-03 0:25:00 | 11 | ABC124 |
| DEF | B456 | 22 | 2021-01-03 0:25:00 | 12 | DEF257 |
| GHI | C789 | 23 | 2021-01-03 0:25:00 | 13 | GHI390 |
| JKL | D890 | 24 | 2021-01-03 0:25:00 | 14 | JKL491 |
| MNO | E012 | 25 | 2021-01-03 0:25:00 | 15 | MNO513 |
+-----+------+--------+--------------------+------+---------+
我想将 ID1 和 ID2 分组并按 DOC_NO 和 DATE 排列 df
Post 我想创建一个新列 REFERENCE_COST,其中 REFERENCE_COST 是关于时间和 DOC_NO 安排的最高成本,这意味着如果成本随着时间和DOC_NO,较高的 COST 现在将设置为 REFERENCE_COST
所以新的 df 看起来如下:
+-----+------+--------+--------------------+------+---------+----------+
| ID1 | ID2 | DOC_NO | DATE | COST | CLIENT | REF_COST |
+-----+------+--------+--------------------+------+---------+----------+
| ABC | A123 | 1 | 2021-01-01 0:10:00 | 11 | ABC123 | 11 |
| DEF | B456 | 2 | 2021-01-01 0:10:00 | 12 | DEF256 | 12 |
| GHI | C789 | 3 | 2021-01-01 0:10:00 | 13 | GHI389 | 13 |
| JKL | D890 | 4 | 2021-01-01 0:10:00 | 14 | JKL490 | 14 |
| MNO | E012 | 5 | 2021-01-01 0:10:00 | 15 | MNO512 | 15 |
| ABC | A123 | 6 | 2021-01-01 0:15:00 | 11 | ABC623 | 11 |
| DEF | B456 | 7 | 2021-01-01 0:15:00 | 12 | DEF756 | 12 |
| GHI | C789 | 8 | 2021-01-01 0:15:00 | 13 | GHI889 | 13 |
| JKL | D890 | 9 | 2021-01-02 0:15:00 | 14 | JKL990 | 14 |
| MNO | E012 | 10 | 2021-01-03 0:15:00 | 15 | MNO1012 | 15 |
| ABC | A123 | 11 | 2021-01-03 0:20:00 | 10 | GHI890 | 11 |
| DEF | B456 | 12 | 2021-01-03 0:20:00 | 11 | JKL991 | 12 |
| GHI | C789 | 13 | 2021-01-03 0:20:00 | 12 | MNO1013 | 13 |
| JKL | D890 | 14 | 2021-01-03 0:20:00 | 13 | GHI891 | 14 |
| MNO | E012 | 15 | 2021-01-03 0:20:00 | 14 | JKL992 | 15 |
| ABC | A123 | 16 | 2021-01-03 0:20:00 | 12 | MNO1014 | 12 |
| DEF | B456 | 17 | 2021-01-03 0:20:00 | 13 | GHI892 | 13 |
| GHI | C789 | 18 | 2021-01-03 0:20:00 | 14 | JKL993 | 14 |
| JKL | D890 | 19 | 2021-01-03 0:20:00 | 15 | MNO1015 | 15 |
| MNO | E012 | 20 | 2021-01-03 0:20:00 | 16 | GHI893 | 16 |
| ABC | A123 | 21 | 2021-01-03 0:25:00 | 11 | ABC124 | 12 |
| DEF | B456 | 22 | 2021-01-03 0:25:00 | 12 | DEF257 | 13 |
| GHI | C789 | 23 | 2021-01-03 0:25:00 | 13 | GHI390 | 14 |
| JKL | D890 | 24 | 2021-01-03 0:25:00 | 14 | JKL491 | 15 |
| MNO | E012 | 25 | 2021-01-03 0:25:00 | 15 | MNO513 | 16 |
+-----+------+--------+--------------------+------+---------+----------+
不,我希望能够将 REFERENCE_COST 与 COST 进行比较,并筛选出 COST 小于 REFERENCE_COST 的所有行,并添加两个新列 DATE_LAST_REF_COST_MET & CLIENT_LAST_REF_COST_MET 显示 REFERENCE_COST 的日期和 REFERENCE_COST 的客户编号
因此,生成的 df 将如下所示:
+-----+------+--------+--------------------+------+---------+----------+------------------------+--------------------------+
| ID1 | ID2 | DOC_NO | DATE | COST | CLIENT | REF_COST | DATE_LAST_REF_COST_MET | CLIENT_LAST_REF_COST_MET |
+-----+------+--------+--------------------+------+---------+----------+------------------------+--------------------------+
| ABC | A123 | 11 | 2021-01-03 0:20:00 | 10 | GHI890 | 11 | 2021-01-01 0:15:00 | ABC623 |
| DEF | B456 | 12 | 2021-01-03 0:20:00 | 11 | JKL991 | 12 | 2021-01-01 0:15:00 | DEF756 |
| GHI | C789 | 13 | 2021-01-03 0:20:00 | 12 | MNO1013 | 13 | 2021-01-01 0:15:00 | GHI889 |
| JKL | D890 | 14 | 2021-01-03 0:20:00 | 13 | GHI891 | 14 | 2021-01-02 0:15:00 | JKL990 |
| MNO | E012 | 15 | 2021-01-03 0:20:00 | 14 | JKL992 | 15 | 2021-01-03 0:15:00 | MNO1012 |
| ABC | A123 | 21 | 2021-01-03 0:25:00 | 11 | ABC124 | 12 | 2021-01-03 0:20:00 | MNO1014 |
| DEF | B456 | 22 | 2021-01-03 0:25:00 | 12 | DEF257 | 13 | 2021-01-03 0:20:00 | GHI892 |
| GHI | C789 | 23 | 2021-01-03 0:25:00 | 13 | GHI390 | 14 | 2021-01-03 0:20:00 | JKL993 |
| JKL | D890 | 24 | 2021-01-03 0:25:00 | 14 | JKL491 | 15 | 2021-01-03 0:20:00 | MNO1015 |
| MNO | E012 | 25 | 2021-01-03 0:25:00 | 15 | MNO513 | 16 | 2021-01-03 0:20:00 | GHI893 |
+-----+------+--------+--------------------+------+---------+----------+------------------------+--------------------------+
这是我能够做到的:
df %>%
group_by(ID1, ID2) %>%
arrange(DATE, DOC_NO, .by_group = TRUE) %>%
mutate(diff = COST - lag(COST, default = first(COST)))%>%
mutate(REF_COST = case_when(diff < 0~lag(COST), TRUE~diff)) %>%
mutate(DATE_LAST_REF_COST_MET= case_when(diff < 0~lag(DATE), TRUE~DATE)) %>%
mutate(CLIENT_LAST_REF_COST_MET= case_when(diff < 0~lag(CLIENT), TRUE~CLIENT))
这样做的局限性在于它不会在进行计算时用 DATE 和 DOC_NO 更改 REFERENCE_COST
我不确定如何实现这个
您可以使用cummax
设置REF_COST
和lag
获取每个组中的前一个值。使用 filter
仅保留参考成本高于成本的那些行。
library(dplyr)
df %>%
group_by(ID1, ID2) %>%
mutate(REF_COST = cummax(COST),
DATE_LAST_REF_COST_MET = lag(DATE),
CLIENT_LAST_REF_COST_MET = lag(CLIENT)) %>%
ungroup() %>%
filter(REF_COST > COST)
# ID1 ID2 DOC_NO DATE COST CLIENT REF_COST DATE_LAST_REF_COST_MET CLIENT_LAST_REF_COST_MET
# <chr> <chr> <int> <chr> <int> <chr> <int> <chr> <chr>
# 1 ABC A123 11 2021-01-03 0:20:00 10 GHI890 11 2021-01-01 0:15:00 ABC623
# 2 DEF B456 12 2021-01-03 0:20:00 11 JKL991 12 2021-01-01 0:15:00 DEF756
# 3 GHI C789 13 2021-01-03 0:20:00 12 MNO1013 13 2021-01-01 0:15:00 GHI889
# 4 JKL D890 14 2021-01-03 0:20:00 13 GHI891 14 2021-01-02 0:15:00 JKL990
# 5 MNO E012 15 2021-01-03 0:20:00 14 JKL992 15 2021-01-03 0:15:00 MNO1012
# 6 ABC A123 21 2021-01-03 0:25:00 11 ABC124 12 2021-01-03 0:20:00 MNO1014
# 7 DEF B456 22 2021-01-03 0:25:00 12 DEF257 13 2021-01-03 0:20:00 GHI892
# 8 GHI C789 23 2021-01-03 0:25:00 13 GHI390 14 2021-01-03 0:20:00 JKL993
# 9 JKL D890 24 2021-01-03 0:25:00 14 JKL491 15 2021-01-03 0:20:00 MNO1015
#10 MNO E012 25 2021-01-03 0:25:00 15 MNO513 16 2021-01-03 0:20:00 GHI893
数据
如果您以更容易复制的 dput
形式提供数据,会更容易提供帮助。
df <- structure(list(ID1 = c("ABC", "DEF", "GHI", "JKL", "MNO", "ABC",
"DEF", "GHI", "JKL", "MNO", "ABC", "DEF", "GHI", "JKL", "MNO",
"ABC", "DEF", "GHI", "JKL", "MNO", "ABC", "DEF", "GHI", "JKL",
"MNO"), ID2 = c("A123", "B456", "C789", "D890", "E012", "A123",
"B456", "C789", "D890", "E012", "A123", "B456", "C789", "D890",
"E012", "A123", "B456", "C789", "D890", "E012", "A123", "B456",
"C789", "D890", "E012"), DOC_NO = 1:25, DATE = c("2021-01-01 0:10:00",
"2021-01-01 0:10:00", "2021-01-01 0:10:00", "2021-01-01 0:10:00",
"2021-01-01 0:10:00", "2021-01-01 0:15:00", "2021-01-01 0:15:00",
"2021-01-01 0:15:00", "2021-01-02 0:15:00", "2021-01-03 0:15:00",
"2021-01-03 0:20:00", "2021-01-03 0:20:00", "2021-01-03 0:20:00",
"2021-01-03 0:20:00", "2021-01-03 0:20:00", "2021-01-03 0:20:00",
"2021-01-03 0:20:00", "2021-01-03 0:20:00", "2021-01-03 0:20:00",
"2021-01-03 0:20:00", "2021-01-03 0:25:00", "2021-01-03 0:25:00",
"2021-01-03 0:25:00", "2021-01-03 0:25:00", "2021-01-03 0:25:00"
), COST = c(11L, 12L, 13L, 14L, 15L, 11L, 12L, 13L, 14L, 15L,
10L, 11L, 12L, 13L, 14L, 12L, 13L, 14L, 15L, 16L, 11L, 12L, 13L,
14L, 15L), CLIENT = c("ABC123", "DEF256", "GHI389", "JKL490",
"MNO512", "ABC623", "DEF756", "GHI889", "JKL990", "MNO1012",
"GHI890", "JKL991", "MNO1013", "GHI891", "JKL992", "MNO1014",
"GHI892", "JKL993", "MNO1015", "GHI893", "ABC124", "DEF257",
"GHI390", "JKL491", "MNO513")), row.names = c(NA, -25L), class = "data.frame")
我有一个 df
+-----+------+--------+--------------------+------+---------+
| ID1 | ID2 | DOC_NO | DATE | COST | CLIENT |
+-----+------+--------+--------------------+------+---------+
| ABC | A123 | 1 | 2021-01-01 0:10:00 | 11 | ABC123 |
| DEF | B456 | 2 | 2021-01-01 0:10:00 | 12 | DEF256 |
| GHI | C789 | 3 | 2021-01-01 0:10:00 | 13 | GHI389 |
| JKL | D890 | 4 | 2021-01-01 0:10:00 | 14 | JKL490 |
| MNO | E012 | 5 | 2021-01-01 0:10:00 | 15 | MNO512 |
| ABC | A123 | 6 | 2021-01-01 0:15:00 | 11 | ABC623 |
| DEF | B456 | 7 | 2021-01-01 0:15:00 | 12 | DEF756 |
| GHI | C789 | 8 | 2021-01-01 0:15:00 | 13 | GHI889 |
| JKL | D890 | 9 | 2021-01-02 0:15:00 | 14 | JKL990 |
| MNO | E012 | 10 | 2021-01-03 0:15:00 | 15 | MNO1012 |
| ABC | A123 | 11 | 2021-01-03 0:20:00 | 10 | GHI890 |
| DEF | B456 | 12 | 2021-01-03 0:20:00 | 11 | JKL991 |
| GHI | C789 | 13 | 2021-01-03 0:20:00 | 12 | MNO1013 |
| JKL | D890 | 14 | 2021-01-03 0:20:00 | 13 | GHI891 |
| MNO | E012 | 15 | 2021-01-03 0:20:00 | 14 | JKL992 |
| ABC | A123 | 16 | 2021-01-03 0:20:00 | 12 | MNO1014 |
| DEF | B456 | 17 | 2021-01-03 0:20:00 | 13 | GHI892 |
| GHI | C789 | 18 | 2021-01-03 0:20:00 | 14 | JKL993 |
| JKL | D890 | 19 | 2021-01-03 0:20:00 | 15 | MNO1015 |
| MNO | E012 | 20 | 2021-01-03 0:20:00 | 16 | GHI893 |
| ABC | A123 | 21 | 2021-01-03 0:25:00 | 11 | ABC124 |
| DEF | B456 | 22 | 2021-01-03 0:25:00 | 12 | DEF257 |
| GHI | C789 | 23 | 2021-01-03 0:25:00 | 13 | GHI390 |
| JKL | D890 | 24 | 2021-01-03 0:25:00 | 14 | JKL491 |
| MNO | E012 | 25 | 2021-01-03 0:25:00 | 15 | MNO513 |
+-----+------+--------+--------------------+------+---------+
我想将 ID1 和 ID2 分组并按 DOC_NO 和 DATE 排列 df Post 我想创建一个新列 REFERENCE_COST,其中 REFERENCE_COST 是关于时间和 DOC_NO 安排的最高成本,这意味着如果成本随着时间和DOC_NO,较高的 COST 现在将设置为 REFERENCE_COST 所以新的 df 看起来如下:
+-----+------+--------+--------------------+------+---------+----------+
| ID1 | ID2 | DOC_NO | DATE | COST | CLIENT | REF_COST |
+-----+------+--------+--------------------+------+---------+----------+
| ABC | A123 | 1 | 2021-01-01 0:10:00 | 11 | ABC123 | 11 |
| DEF | B456 | 2 | 2021-01-01 0:10:00 | 12 | DEF256 | 12 |
| GHI | C789 | 3 | 2021-01-01 0:10:00 | 13 | GHI389 | 13 |
| JKL | D890 | 4 | 2021-01-01 0:10:00 | 14 | JKL490 | 14 |
| MNO | E012 | 5 | 2021-01-01 0:10:00 | 15 | MNO512 | 15 |
| ABC | A123 | 6 | 2021-01-01 0:15:00 | 11 | ABC623 | 11 |
| DEF | B456 | 7 | 2021-01-01 0:15:00 | 12 | DEF756 | 12 |
| GHI | C789 | 8 | 2021-01-01 0:15:00 | 13 | GHI889 | 13 |
| JKL | D890 | 9 | 2021-01-02 0:15:00 | 14 | JKL990 | 14 |
| MNO | E012 | 10 | 2021-01-03 0:15:00 | 15 | MNO1012 | 15 |
| ABC | A123 | 11 | 2021-01-03 0:20:00 | 10 | GHI890 | 11 |
| DEF | B456 | 12 | 2021-01-03 0:20:00 | 11 | JKL991 | 12 |
| GHI | C789 | 13 | 2021-01-03 0:20:00 | 12 | MNO1013 | 13 |
| JKL | D890 | 14 | 2021-01-03 0:20:00 | 13 | GHI891 | 14 |
| MNO | E012 | 15 | 2021-01-03 0:20:00 | 14 | JKL992 | 15 |
| ABC | A123 | 16 | 2021-01-03 0:20:00 | 12 | MNO1014 | 12 |
| DEF | B456 | 17 | 2021-01-03 0:20:00 | 13 | GHI892 | 13 |
| GHI | C789 | 18 | 2021-01-03 0:20:00 | 14 | JKL993 | 14 |
| JKL | D890 | 19 | 2021-01-03 0:20:00 | 15 | MNO1015 | 15 |
| MNO | E012 | 20 | 2021-01-03 0:20:00 | 16 | GHI893 | 16 |
| ABC | A123 | 21 | 2021-01-03 0:25:00 | 11 | ABC124 | 12 |
| DEF | B456 | 22 | 2021-01-03 0:25:00 | 12 | DEF257 | 13 |
| GHI | C789 | 23 | 2021-01-03 0:25:00 | 13 | GHI390 | 14 |
| JKL | D890 | 24 | 2021-01-03 0:25:00 | 14 | JKL491 | 15 |
| MNO | E012 | 25 | 2021-01-03 0:25:00 | 15 | MNO513 | 16 |
+-----+------+--------+--------------------+------+---------+----------+
不,我希望能够将 REFERENCE_COST 与 COST 进行比较,并筛选出 COST 小于 REFERENCE_COST 的所有行,并添加两个新列 DATE_LAST_REF_COST_MET & CLIENT_LAST_REF_COST_MET 显示 REFERENCE_COST 的日期和 REFERENCE_COST 的客户编号 因此,生成的 df 将如下所示:
+-----+------+--------+--------------------+------+---------+----------+------------------------+--------------------------+
| ID1 | ID2 | DOC_NO | DATE | COST | CLIENT | REF_COST | DATE_LAST_REF_COST_MET | CLIENT_LAST_REF_COST_MET |
+-----+------+--------+--------------------+------+---------+----------+------------------------+--------------------------+
| ABC | A123 | 11 | 2021-01-03 0:20:00 | 10 | GHI890 | 11 | 2021-01-01 0:15:00 | ABC623 |
| DEF | B456 | 12 | 2021-01-03 0:20:00 | 11 | JKL991 | 12 | 2021-01-01 0:15:00 | DEF756 |
| GHI | C789 | 13 | 2021-01-03 0:20:00 | 12 | MNO1013 | 13 | 2021-01-01 0:15:00 | GHI889 |
| JKL | D890 | 14 | 2021-01-03 0:20:00 | 13 | GHI891 | 14 | 2021-01-02 0:15:00 | JKL990 |
| MNO | E012 | 15 | 2021-01-03 0:20:00 | 14 | JKL992 | 15 | 2021-01-03 0:15:00 | MNO1012 |
| ABC | A123 | 21 | 2021-01-03 0:25:00 | 11 | ABC124 | 12 | 2021-01-03 0:20:00 | MNO1014 |
| DEF | B456 | 22 | 2021-01-03 0:25:00 | 12 | DEF257 | 13 | 2021-01-03 0:20:00 | GHI892 |
| GHI | C789 | 23 | 2021-01-03 0:25:00 | 13 | GHI390 | 14 | 2021-01-03 0:20:00 | JKL993 |
| JKL | D890 | 24 | 2021-01-03 0:25:00 | 14 | JKL491 | 15 | 2021-01-03 0:20:00 | MNO1015 |
| MNO | E012 | 25 | 2021-01-03 0:25:00 | 15 | MNO513 | 16 | 2021-01-03 0:20:00 | GHI893 |
+-----+------+--------+--------------------+------+---------+----------+------------------------+--------------------------+
这是我能够做到的:
df %>%
group_by(ID1, ID2) %>%
arrange(DATE, DOC_NO, .by_group = TRUE) %>%
mutate(diff = COST - lag(COST, default = first(COST)))%>%
mutate(REF_COST = case_when(diff < 0~lag(COST), TRUE~diff)) %>%
mutate(DATE_LAST_REF_COST_MET= case_when(diff < 0~lag(DATE), TRUE~DATE)) %>%
mutate(CLIENT_LAST_REF_COST_MET= case_when(diff < 0~lag(CLIENT), TRUE~CLIENT))
这样做的局限性在于它不会在进行计算时用 DATE 和 DOC_NO 更改 REFERENCE_COST
我不确定如何实现这个
您可以使用cummax
设置REF_COST
和lag
获取每个组中的前一个值。使用 filter
仅保留参考成本高于成本的那些行。
library(dplyr)
df %>%
group_by(ID1, ID2) %>%
mutate(REF_COST = cummax(COST),
DATE_LAST_REF_COST_MET = lag(DATE),
CLIENT_LAST_REF_COST_MET = lag(CLIENT)) %>%
ungroup() %>%
filter(REF_COST > COST)
# ID1 ID2 DOC_NO DATE COST CLIENT REF_COST DATE_LAST_REF_COST_MET CLIENT_LAST_REF_COST_MET
# <chr> <chr> <int> <chr> <int> <chr> <int> <chr> <chr>
# 1 ABC A123 11 2021-01-03 0:20:00 10 GHI890 11 2021-01-01 0:15:00 ABC623
# 2 DEF B456 12 2021-01-03 0:20:00 11 JKL991 12 2021-01-01 0:15:00 DEF756
# 3 GHI C789 13 2021-01-03 0:20:00 12 MNO1013 13 2021-01-01 0:15:00 GHI889
# 4 JKL D890 14 2021-01-03 0:20:00 13 GHI891 14 2021-01-02 0:15:00 JKL990
# 5 MNO E012 15 2021-01-03 0:20:00 14 JKL992 15 2021-01-03 0:15:00 MNO1012
# 6 ABC A123 21 2021-01-03 0:25:00 11 ABC124 12 2021-01-03 0:20:00 MNO1014
# 7 DEF B456 22 2021-01-03 0:25:00 12 DEF257 13 2021-01-03 0:20:00 GHI892
# 8 GHI C789 23 2021-01-03 0:25:00 13 GHI390 14 2021-01-03 0:20:00 JKL993
# 9 JKL D890 24 2021-01-03 0:25:00 14 JKL491 15 2021-01-03 0:20:00 MNO1015
#10 MNO E012 25 2021-01-03 0:25:00 15 MNO513 16 2021-01-03 0:20:00 GHI893
数据
如果您以更容易复制的 dput
形式提供数据,会更容易提供帮助。
df <- structure(list(ID1 = c("ABC", "DEF", "GHI", "JKL", "MNO", "ABC",
"DEF", "GHI", "JKL", "MNO", "ABC", "DEF", "GHI", "JKL", "MNO",
"ABC", "DEF", "GHI", "JKL", "MNO", "ABC", "DEF", "GHI", "JKL",
"MNO"), ID2 = c("A123", "B456", "C789", "D890", "E012", "A123",
"B456", "C789", "D890", "E012", "A123", "B456", "C789", "D890",
"E012", "A123", "B456", "C789", "D890", "E012", "A123", "B456",
"C789", "D890", "E012"), DOC_NO = 1:25, DATE = c("2021-01-01 0:10:00",
"2021-01-01 0:10:00", "2021-01-01 0:10:00", "2021-01-01 0:10:00",
"2021-01-01 0:10:00", "2021-01-01 0:15:00", "2021-01-01 0:15:00",
"2021-01-01 0:15:00", "2021-01-02 0:15:00", "2021-01-03 0:15:00",
"2021-01-03 0:20:00", "2021-01-03 0:20:00", "2021-01-03 0:20:00",
"2021-01-03 0:20:00", "2021-01-03 0:20:00", "2021-01-03 0:20:00",
"2021-01-03 0:20:00", "2021-01-03 0:20:00", "2021-01-03 0:20:00",
"2021-01-03 0:20:00", "2021-01-03 0:25:00", "2021-01-03 0:25:00",
"2021-01-03 0:25:00", "2021-01-03 0:25:00", "2021-01-03 0:25:00"
), COST = c(11L, 12L, 13L, 14L, 15L, 11L, 12L, 13L, 14L, 15L,
10L, 11L, 12L, 13L, 14L, 12L, 13L, 14L, 15L, 16L, 11L, 12L, 13L,
14L, 15L), CLIENT = c("ABC123", "DEF256", "GHI389", "JKL490",
"MNO512", "ABC623", "DEF756", "GHI889", "JKL990", "MNO1012",
"GHI890", "JKL991", "MNO1013", "GHI891", "JKL992", "MNO1014",
"GHI892", "JKL993", "MNO1015", "GHI893", "ABC124", "DEF257",
"GHI390", "JKL491", "MNO513")), row.names = c(NA, -25L), class = "data.frame")