当值大于5时如何计算总分钟数?

How to calculate the total minutes when values were greater than 5?

我有一些时间序列数据,每 15 分钟给出一次值。对于这个数据集,我们可以假设值在 15 分钟内没有变化;因此,如果 8:15am 的值为 1,8:30am 的值为 2,那么我们假设 8:15-8:30 的值为 1,而 8:30 -8:45 值为 2。

现在我想计算值大于 5 时的每月总分钟数。

我的数据框看起来像这样(除了 15 列以上的值)。

# Create a, b, c, d variables
a <- c(
  "06-25-2021 08:00:00 AM",
  "06-25-2021 08:15:00 AM",
  "06-25-2021 08:30:00 AM",
  "06-25-2021 08:45:00 AM",
  "07-25-2021 08:45:00 AM",
  "07-25-2021 09:00:00 AM",
  "08-25-2021 08:45:00 AM",
  "08-25-2021 09:00:00 AM",
  "09-25-2021 09:15:00 AM",
  "09-25-2021 09:30:00 AM"
)
b = c(4, 5, 8, NA, 4, 5, NA, 7, 7, 6)
c = c(6, 10, 8, NA, 8, 5, NA, 8, 7, 2)
d = c(1, 3 ,NA, 6, 4, 8, 2, 4, NA, 10)

df =
  tibble(a, b, c, d) 

df$a = as.POSIXlt(df$a, format = "%m-%d-%Y%H:%M:%S", tz = 'EST')

但我希望它看起来像这样

Name = c("b", "c", "d")
June = c(15, 45, 15 )
Jul = c(NA, 15, 15)
Aug = c(15, 15, NA)
Sept = c(45, 30, 15)

df_2 = tibble (Name,June,Jul,Aug,Sept)

当它是时间序列时,我不确定如何 sumfilter。有人有什么建议吗?

根据描述 - 也许我们 replace 列 'b' 至 'd' 中小于或等于 5 的值将 NA 重塑为 'long' 使用 pivot_longer 格式,从 'a' 列获取 month 并使用 pivot_wider

重塑回 'wide'
library(dplyr)
library(tidyr)
df %>% 
   mutate(across(2:4, ~ replace(., . <=5, NA))) %>% 
   pivot_longer(cols = b:d, names_to = 'Name', values_drop_na = TRUE) %>%
   mutate(a = format(a, '%b')) %>%
   pivot_wider(names_from = a, values_from = value, 
    values_fill = 0, values_fn = list(value = function(x) length(x) * 15))

使用aggregatereplace 小于或等于 5 的所有值加零并计算总和。剩下的正在成型。

aggregate(cbind(b, c, d) ~ a$mon, df, \(x) sum(replace(x, x <= 5, 0), na.rm=T)) |>
  t() |> as.data.frame() |>
  (\(x) setNames(x, month.abb[unlist(x[1, ])])[-1, ])() |>
  (\(x) cbind(Name=rownames(x), x))() |> `rownames<-`(NULL)  ## optional
#   Name May Jun Jul Aug
# 1    b   0   0   7   6
# 2    c  16   8   8   0
# 3    d   0   8   0  10

数据

df <- structure(list(a = structure(list(sec = c(0, 0, 0, 0, 0, 0, 0, 
0, 0, 0), min = c(0L, 15L, 30L, 45L, 45L, 0L, 45L, 0L, 15L, 30L
), hour = c(8L, 8L, 8L, 8L, 8L, 9L, 8L, 9L, 9L, 9L), mday = c(25L, 
25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L), mon = c(5L, 5L, 
5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L), year = c(121L, 121L, 121L, 121L, 
121L, 121L, 121L, 121L, 121L, 121L), wday = c(5L, 5L, 5L, 5L, 
0L, 0L, 3L, 3L, 6L, 6L), yday = c(175L, 175L, 175L, 175L, 205L, 
205L, 236L, 236L, 267L, 267L), isdst = c(0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L), zone = c("EST", "EST", "EST", "EST", "EST", 
"EST", "EST", "EST", "EST", "EST"), gmtoff = c(-18000L, -18000L, 
-18000L, -18000L, -18000L, -18000L, -18000L, -18000L, -18000L, 
-18000L)), class = c("POSIXlt", "POSIXt"), tzone = c("EST", "EST", 
"EST")), b = c(4, 5, 8, NA, 4, 5, NA, 7, 7, 6), c = c(6, 10, 
8, NA, 8, 5, NA, 8, 7, 2), d = c(1, 3, NA, 6, 4, 8, 2, 4, NA, 
10)), row.names = c(NA, -10L), class = "data.frame")


# [1] "R version 4.1.2 (2021-11-01)"