矩阵中的区间数
Interval numbers in matrix
我有一个包含 3523 个观察值和 92 个变量的数据框。
下面是一个数据帧为 6 的例子;观察的 24 小时记录从 4:00am 开始,到 4:00am 结束。
04:00 04:15 04:30 05:00 ... 04:35
1 - - - - ... -
2 2 2 2 - ... -
3 2 - - 2 ... -
4 - - 2 - ... -
5 - - - - ... -
6 - - - - ... 2
每行包含值“-”和“2”。
我想提取以“2”开头的区间的开始和结束
For example 2: 04:15-04:30;
3: 04:00 ; 05:00
4: 04:30
谢谢
让我们扩展一下您的示例。在扩展的示例中,我们可以注意到第 1 行没有 2
,并且还有一些更棘手的问题,例如第 6 行我们有 2
,然后是一个中断( -
),之后是两个 2
、一个 -
和一个 2
的序列。
04:00 04:15 04:30 05:00 05:15 05:30
1: - - - - - -
2: 2 2 2 - 2 2
3: 2 - - 2 2 2
4: - - 2 - 2 2
5: - - - - 2 2
6: 2 - 2 2 - 2
7: - - - - 2 2
8: 2 2 - 2 2 2
9: - - - - 2 2
10: 2 2 - 2 2 2
如果您输入:
,您可以重现它
WorkSchedulesDay1 <- structure(list(`04:00` = c("-", "2", "2", "-", "-", "2", "-",
"2", "-", "2"), `04:15` = c("-", "2", "-", "-", "-", "-", "-",
"2", "-", "2"), `04:30` = c("-", "2", "-", "2", "-", "2", "-",
"-", "-", "-"), `05:00` = c("-", "-", "2", "-", "-", "2", "-",
"2", "-", "2"), `05:15` = c("-", "2", "2", "2", "2", "-", "2",
"2", "2", "2"), `05:30` = c("-", "2", "2", "2", "2", "2", "2",
"2", "2", "2")), row.names = c(NA, -10L), class = c("data.table",
"data.frame"))
之后您应用代码:
WorkSchedulesDay1 <- WorkSchedulesDay1 %>%
group_by(rn = row_number()) %>%
gather(time, val, 1:6) %>%
arrange(time) %>%
mutate(tmp = cumsum(coalesce(val != lag(val), FALSE))) %>% arrange(rn) %>%
filter(!val == "-") %>%
group_by(rn, tmp) %>%
mutate(
time = case_when(
n() > 1 ~ paste(min(time), max(time), sep = " - "),
TRUE ~ time
)
) %>%
ungroup() %>% distinct(rn, tmp, time) %>%
group_by(rn) %>%
mutate(
intervals = case_when(
n() > 1 ~ paste(time, collapse = ", "),
TRUE ~ time
)
) %>% distinct(rn, intervals) %>%
write_csv("WorkSchedulesDay1.csv")
你会看到你得到的是:
rn intervals
<int> <chr>
2 04:00 - 04:30, 05:15 - 05:30
3 04:00, 05:00 - 05:30
4 04:30, 05:15 - 05:30
5 05:15 - 05:30
6 04:00, 04:30 - 05:00, 05:30
7 05:15 - 05:30
8 04:00 - 04:15, 05:00 - 05:30
9 05:15 - 05:30
10 04:00 - 04:15, 05:00 - 05:30
第 1 行没有记录,因为那里只有 -
。
同样,第2行没有05:00
的记录,只是因为那里有一个-
。
以类似的方式,第 6 行有 04:00, 04:30 - 05:00, 05:30
,因为 04:15
和 05:15
有 -
。
我有一个包含 3523 个观察值和 92 个变量的数据框。
下面是一个数据帧为 6 的例子;观察的 24 小时记录从 4:00am 开始,到 4:00am 结束。
04:00 04:15 04:30 05:00 ... 04:35
1 - - - - ... -
2 2 2 2 - ... -
3 2 - - 2 ... -
4 - - 2 - ... -
5 - - - - ... -
6 - - - - ... 2
每行包含值“-”和“2”。
我想提取以“2”开头的区间的开始和结束
For example 2: 04:15-04:30;
3: 04:00 ; 05:00
4: 04:30
谢谢
让我们扩展一下您的示例。在扩展的示例中,我们可以注意到第 1 行没有 2
,并且还有一些更棘手的问题,例如第 6 行我们有 2
,然后是一个中断( -
),之后是两个 2
、一个 -
和一个 2
的序列。
04:00 04:15 04:30 05:00 05:15 05:30
1: - - - - - -
2: 2 2 2 - 2 2
3: 2 - - 2 2 2
4: - - 2 - 2 2
5: - - - - 2 2
6: 2 - 2 2 - 2
7: - - - - 2 2
8: 2 2 - 2 2 2
9: - - - - 2 2
10: 2 2 - 2 2 2
如果您输入:
,您可以重现它WorkSchedulesDay1 <- structure(list(`04:00` = c("-", "2", "2", "-", "-", "2", "-",
"2", "-", "2"), `04:15` = c("-", "2", "-", "-", "-", "-", "-",
"2", "-", "2"), `04:30` = c("-", "2", "-", "2", "-", "2", "-",
"-", "-", "-"), `05:00` = c("-", "-", "2", "-", "-", "2", "-",
"2", "-", "2"), `05:15` = c("-", "2", "2", "2", "2", "-", "2",
"2", "2", "2"), `05:30` = c("-", "2", "2", "2", "2", "2", "2",
"2", "2", "2")), row.names = c(NA, -10L), class = c("data.table",
"data.frame"))
之后您应用代码:
WorkSchedulesDay1 <- WorkSchedulesDay1 %>%
group_by(rn = row_number()) %>%
gather(time, val, 1:6) %>%
arrange(time) %>%
mutate(tmp = cumsum(coalesce(val != lag(val), FALSE))) %>% arrange(rn) %>%
filter(!val == "-") %>%
group_by(rn, tmp) %>%
mutate(
time = case_when(
n() > 1 ~ paste(min(time), max(time), sep = " - "),
TRUE ~ time
)
) %>%
ungroup() %>% distinct(rn, tmp, time) %>%
group_by(rn) %>%
mutate(
intervals = case_when(
n() > 1 ~ paste(time, collapse = ", "),
TRUE ~ time
)
) %>% distinct(rn, intervals) %>%
write_csv("WorkSchedulesDay1.csv")
你会看到你得到的是:
rn intervals
<int> <chr>
2 04:00 - 04:30, 05:15 - 05:30
3 04:00, 05:00 - 05:30
4 04:30, 05:15 - 05:30
5 05:15 - 05:30
6 04:00, 04:30 - 05:00, 05:30
7 05:15 - 05:30
8 04:00 - 04:15, 05:00 - 05:30
9 05:15 - 05:30
10 04:00 - 04:15, 05:00 - 05:30
第 1 行没有记录,因为那里只有 -
。
同样,第2行没有05:00
的记录,只是因为那里有一个-
。
以类似的方式,第 6 行有 04:00, 04:30 - 05:00, 05:30
,因为 04:15
和 05:15
有 -
。