R 中的水平与时间条件

Question

我有一个包含日期和时间变量的数据集。我创建了一个新变量（随时间变化）并将其命名为 "time.of.day"。我想根据时间段分配不同的标签（实际上是 4 个）。我正在尝试以下操作：

levels(df$time.of.day) <- list(
    label_1 = df$time.of.day[df$time >= "07:00:00" & df$time <= "10:00:00"],
    label_2 = df$time.of.day[df$time >= "10:00:00" & df$time <= "16:00:00"],
    label_3 = df$time.of.day[df$time >= "16:00:00" & df$time <= "19:00:00"],
    label_4 = df$time.of.day[df$time >= "19:00:00" & df$time <= "23:59:59"]
    )

但没有任何反应，我也没有收到任何错误或警告。

以下是上述列的示例：

             date     time time.of.day
1      2014-03-21 09:20:08    09:20:08
2      2014-03-21 10:05:22    10:05:22
3      2014-03-26 05:34:04    05:34:04
4      2014-03-26 09:35:05    09:35:05
5      2014-03-27 01:45:03    01:45:03
6      2014-03-27 02:45:27    02:45:27
7      2014-03-27 14:46:26    14:46:26
8      2014-03-28 04:03:30    04:03:30

为了方便未来的用户，这里是生成上面数据框的代码：

df <- data.frame(
date = c("2014-03-21", "2014-03-21", "2014-03-26", "2014-03-26", "2014-03-27", "2014-03-27", "2014-03-27", "2014-03-28"),
time = c("09:20:08", "10:05:22", "05:34:04", "09:35:05", "01:45:03", "02:45:27", "14:46:26", "04:03:30"),
time.of.day = c("09:20:08", "10:05:22", "05:34:04", "09:35:05", "01:45:03", "02:45:27", "14:46:26", "04:03:30")

)

P.S.: 我在之前的工作中使用 unique、grep 和字符串完成了这个并且它有效。

你能帮忙吗？谢谢

Answer 1

好的，所以我用“[”来解决这个问题。但我还是很好奇为什么它对级别和列表不起作用？

df$time.of.day[df$time >= "00:00:00" & df$time <= "07:00:00"] <- "morning"
df$time.of.day[df$time >= "07:00:00" & df$time <= "10:00:00"] <- "home2work"
df$time.of.day[df$time >= "10:00:00" & df$time <= "16:00:00"] <- "mid_day"
df$time.of.day[df$time >= "16:00:00" & df$time <= "19:00:00"] <- "work2home"
df$time.of.day[df$time >= "19:00:00" & df$time <= "23:59:59"] <- "night"

Answer 2

其他选项是：

library(chron)
indx <- c('00:00:00', '07:00:00', '10:00:00', '16:00:00',
                  '19:00:00', '23:59:59')
indx2 <- c('morning', 'home2work', 'mid_day', 'work2home', 'night')
h1 <- chron(times=df$time)
br <- chron(times=indx)
df$time.of.day <-  cut(h1, br, labels=indx2)
df$time.of.day
#[1] home2work mid_day   morning   home2work morning   morning   mid_day  
#[8] morning  
#Levels: morning home2work mid_day work2home night

或者你可以这样做：

indx3 <- max.col(t(Vectorize(function(x) x>=indx[-length(indx)] & 
                                 x<= indx[-1])(df$time)), 'first')
indx2[indx3]
# [1] "home2work" "mid_day"   "morning"   "home2work" "morning"   "morning"  
# [7] "mid_day"   "morning"

数据

df <-  structure(list(date = c("2014-03-21", "2014-03-21", "2014-03-26", 
"2014-03-26", "2014-03-27", "2014-03-27", "2014-03-27", "2014-03-28"
), time = c("09:20:08", "10:05:22", "05:34:04", "09:35:05", "01:45:03", 
"02:45:27", "14:46:26", "04:03:30"), time.of.day = c("09:20:08", 
"10:05:22", "05:34:04", "09:35:05", "01:45:03", "02:45:27", "14:46:26", 
"04:03:30")), .Names = c("date", "time", "time.of.day"), class = "data.frame", 
row.names = c("1", "2", "3", "4", "5", "6", "7", "8"))

R 中的水平与时间条件

Levels in R with conditions on time

r

r-factor

数据