在两个给定时间之间替换列中的值
Replacing values in a column between two given times
我想替换特定列中两次之间的值。我知道这些值的最小和最大时间,并且想用特定标签替换这两个时间之间的所有数据点。
我有一个包含很多组数据的大数据集,所以我会尝试在这里做一个简单的例子。假设我想用“峰值”替换“几乎达到峰值”并且我知道这些标签出现的 min/max 次。
points <- c(1,2,3,3,4,3,2,1,11,12,13,14,13,13,12,11)
Status <- c("base", "base", "almost peak", "almost peak", "peak", "almost peak", "base", "base", "base", "base", "almost peak", "peak", "almost peak", "almost peak", "base", "base")
DateTime <- seq(from = as.POSIXct("2021-10-16 11:37:23"), to = as.POSIXct("2021-10-16 11:37:38"), by = "sec")
Group <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)
df <- data.frame(points, Status, DateTime, Group)
#for getting the min and max times of "almost peak" occurrences
df.test <- df %>% group_by(Group) %>%
filter(Status == "almost peak") %>%
summarise(
MinTime = min(DateTime),
MaxTime = max(DateTime)
)
>print(df)
points Status DateTime Group
1 1 base 2021-10-16 11:37:23 1
2 2 base 2021-10-16 11:37:24 1
3 3 almost peak 2021-10-16 11:37:25 1
4 3 almost peak 2021-10-16 11:37:26 1
5 4 peak 2021-10-16 11:37:27 1
6 3 almost peak 2021-10-16 11:37:28 1
7 2 base 2021-10-16 11:37:29 1
8 1 base 2021-10-16 11:37:30 1
9 11 base 2021-10-16 11:37:31 2
10 12 base 2021-10-16 11:37:32 2
11 13 almost peak 2021-10-16 11:37:33 2
12 14 peak 2021-10-16 11:37:34 2
13 13 almost peak 2021-10-16 11:37:35 2
14 13 almost peak 2021-10-16 11:37:36 2
15 12 base 2021-10-16 11:37:37 2
16 11 base 2021-10-16 11:37:38 2
同样,我想用每个组的“峰值”替换 MinTime
和 MaxTime
之间的所有数据点。
我试过将 mutate()
与 replace()
一起使用,如下所示,但它似乎不起作用。
我认为这很接近,但不太正确。
df.test.replace <- df %>%
group_by(Group) %>%
mutate(Status = replace(Status, DateTime >= df.test$MinTime & DateTime <= df.test$MaxTime, "peak"))
澄清一下,这是我想要的输出。 min/max 时间之间的所有状态标签都已替换为“peak”
points Status DateTime Group
1 1 base 2021-10-16 11:37:23 1
2 2 base 2021-10-16 11:37:24 1
3 3 peak 2021-10-16 11:37:25 1
4 3 peak 2021-10-16 11:37:26 1
5 4 peak 2021-10-16 11:37:27 1
6 3 peak 2021-10-16 11:37:28 1
7 2 base 2021-10-16 11:37:29 1
8 1 base 2021-10-16 11:37:30 1
9 11 base 2021-10-16 11:37:31 2
10 12 base 2021-10-16 11:37:32 2
11 13 peak 2021-10-16 11:37:33 2
12 14 peak 2021-10-16 11:37:34 2
13 13 peak 2021-10-16 11:37:35 2
14 13 peak 2021-10-16 11:37:36 2
15 12 base 2021-10-16 11:37:37 2
16 11 base 2021-10-16 11:37:38 2
如有任何指点,我们将不胜感激。谢谢。
您需要为正确的值编制索引才能进行替换。尝试使用 dplyr 中的 case_when
:
library(dplyr)
df %>%
group_by(Group) %>%
mutate(Status = case_when(
DateTime >= df.test$MinTime[1] &
DateTime <= df.test$MaxTime[1] ~ "peak",
DateTime >= df.test$MinTime[2] &
DateTime <= df.test$MaxTime[2] ~ "peak",
TRUE ~ as.character(Status)))
如果你想避免手动索引,将所有数据放在同一个数据框中:
df_all <- dplyr::left_join(df, df.test, by = "Group")
然后 运行 代码在同一个 table 中使用变量“MinTime”和“MaxTime”,而不是从另一个数据帧调用:
df_all %>%
mutate(Status = case_when(
DateTime >= MinTime &
DateTime <= MaxTime ~ "peak",
TRUE ~ as.character(Status)))
不需要case_when
df %>%
group_by(Group) %>%
mutate(
Status = ifelse(Status == "almost peak" & DateTime < max(DateTime) & DateTime > min(DateTime), "peak", Status)
)
#> # A tibble: 16 × 4
#> # Groups: Group [2]
#> points Status DateTime Group
#> <dbl> <chr> <dttm> <dbl>
#> 1 1 base 2021-10-16 11:37:23 1
#> 2 2 base 2021-10-16 11:37:24 1
#> 3 3 peak 2021-10-16 11:37:25 1
#> 4 3 peak 2021-10-16 11:37:26 1
#> 5 4 peak 2021-10-16 11:37:27 1
#> 6 3 peak 2021-10-16 11:37:28 1
#> 7 2 base 2021-10-16 11:37:29 1
#> 8 1 base 2021-10-16 11:37:30 1
#> 9 11 base 2021-10-16 11:37:31 2
#> 10 12 base 2021-10-16 11:37:32 2
#> 11 13 peak 2021-10-16 11:37:33 2
#> 12 14 peak 2021-10-16 11:37:34 2
#> 13 13 peak 2021-10-16 11:37:35 2
#> 14 13 peak 2021-10-16 11:37:36 2
#> 15 12 base 2021-10-16 11:37:37 2
#> 16 11 base 2021-10-16 11:37:38 2