在两个给定时间之间替换列中的值

Replacing values in a column between two given times

我想替换特定列中两次之间的值。我知道这些值的最小和最大时间,并且想用特定标签替换这两个时间之间的所有数据点。

我有一个包含很多组数据的大数据集,所以我会尝试在这里做一个简单的例子。假设我想用“峰值”替换“几乎达到峰值”并且我知道这些标签出现的 min/max 次。

points <- c(1,2,3,3,4,3,2,1,11,12,13,14,13,13,12,11)

Status <- c("base", "base", "almost peak", "almost peak", "peak", "almost peak", "base", "base", "base", "base", "almost peak", "peak", "almost peak", "almost peak", "base", "base")

DateTime <- seq(from = as.POSIXct("2021-10-16 11:37:23"), to = as.POSIXct("2021-10-16 11:37:38"), by = "sec")

Group <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)

df <- data.frame(points, Status, DateTime, Group)

#for getting the min and max times of "almost peak" occurrences 
df.test <- df %>% group_by(Group) %>% 
  filter(Status == "almost peak") %>%
  summarise(
    MinTime = min(DateTime),
    MaxTime = max(DateTime)
  )
>print(df)
      points   Status       DateTime      Group
1       1        base 2021-10-16 11:37:23     1
2       2        base 2021-10-16 11:37:24     1
3       3 almost peak 2021-10-16 11:37:25     1
4       3 almost peak 2021-10-16 11:37:26     1
5       4        peak 2021-10-16 11:37:27     1
6       3 almost peak 2021-10-16 11:37:28     1
7       2        base 2021-10-16 11:37:29     1
8       1        base 2021-10-16 11:37:30     1
9      11        base 2021-10-16 11:37:31     2
10     12        base 2021-10-16 11:37:32     2
11     13 almost peak 2021-10-16 11:37:33     2
12     14        peak 2021-10-16 11:37:34     2
13     13 almost peak 2021-10-16 11:37:35     2
14     13 almost peak 2021-10-16 11:37:36     2
15     12        base 2021-10-16 11:37:37     2
16     11        base 2021-10-16 11:37:38     2

同样,我想用每个组的“峰值”替换 MinTimeMaxTime 之间的所有数据点。 我试过将 mutate()replace() 一起使用,如下所示,但它似乎不起作用。 我认为这很接近,但不太正确。

df.test.replace <- df %>%
  group_by(Group) %>%
  mutate(Status = replace(Status, DateTime >= df.test$MinTime & DateTime <= df.test$MaxTime, "peak"))

澄清一下,这是我想要的输出。 min/max 时间之间的所有状态标签都已替换为“peak”

      points   Status       DateTime      Group
1       1        base 2021-10-16 11:37:23     1
2       2        base 2021-10-16 11:37:24     1
3       3        peak 2021-10-16 11:37:25     1
4       3        peak 2021-10-16 11:37:26     1
5       4        peak 2021-10-16 11:37:27     1
6       3        peak 2021-10-16 11:37:28     1
7       2        base 2021-10-16 11:37:29     1
8       1        base 2021-10-16 11:37:30     1
9      11        base 2021-10-16 11:37:31     2
10     12        base 2021-10-16 11:37:32     2
11     13        peak 2021-10-16 11:37:33     2
12     14        peak 2021-10-16 11:37:34     2
13     13        peak 2021-10-16 11:37:35     2
14     13        peak 2021-10-16 11:37:36     2
15     12        base 2021-10-16 11:37:37     2
16     11        base 2021-10-16 11:37:38     2

如有任何指点,我们将不胜感激。谢谢。

您需要为正确的值编制索引才能进行替换。尝试使用 dplyr 中的 case_when:

library(dplyr)
df %>%
    group_by(Group) %>%
    mutate(Status = case_when(
               DateTime >= df.test$MinTime[1] &
               DateTime <= df.test$MaxTime[1] ~ "peak",
               DateTime >= df.test$MinTime[2] &
               DateTime <= df.test$MaxTime[2] ~ "peak",
               TRUE ~ as.character(Status)))

如果你想避免手动索引,将所有数据放在同一个数据框中:

df_all <- dplyr::left_join(df, df.test, by = "Group")

然后 运行 代码在同一个 table 中使用变量“MinTime”和“MaxTime”,而不是从另一个数据帧调用:

df_all %>%
    mutate(Status = case_when(
               DateTime >= MinTime &
               DateTime <= MaxTime ~ "peak",
               TRUE ~ as.character(Status)))

不需要case_when

df  %>% 
  group_by(Group) %>% 
  mutate(
    Status = ifelse(Status == "almost peak" & DateTime < max(DateTime) & DateTime > min(DateTime), "peak", Status)
  )
#> # A tibble: 16 × 4
#> # Groups:   Group [2]
#>    points Status DateTime            Group
#>     <dbl> <chr>  <dttm>              <dbl>
#>  1      1 base   2021-10-16 11:37:23     1
#>  2      2 base   2021-10-16 11:37:24     1
#>  3      3 peak   2021-10-16 11:37:25     1
#>  4      3 peak   2021-10-16 11:37:26     1
#>  5      4 peak   2021-10-16 11:37:27     1
#>  6      3 peak   2021-10-16 11:37:28     1
#>  7      2 base   2021-10-16 11:37:29     1
#>  8      1 base   2021-10-16 11:37:30     1
#>  9     11 base   2021-10-16 11:37:31     2
#> 10     12 base   2021-10-16 11:37:32     2
#> 11     13 peak   2021-10-16 11:37:33     2
#> 12     14 peak   2021-10-16 11:37:34     2
#> 13     13 peak   2021-10-16 11:37:35     2
#> 14     13 peak   2021-10-16 11:37:36     2
#> 15     12 base   2021-10-16 11:37:37     2
#> 16     11 base   2021-10-16 11:37:38     2