当值大于5时如何计算总分钟数?
How to calculate the total minutes when values were greater than 5?
我有一些时间序列数据,每 15 分钟给出一次值。对于这个数据集,我们可以假设值在 15 分钟内没有变化;因此,如果 8:15am 的值为 1,8:30am 的值为 2,那么我们假设 8:15-8:30 的值为 1,而 8:30 -8:45 值为 2。
现在我想计算值大于 5 时的每月总分钟数。
我的数据框看起来像这样(除了 15 列以上的值)。
# Create a, b, c, d variables
a <- c(
"06-25-2021 08:00:00 AM",
"06-25-2021 08:15:00 AM",
"06-25-2021 08:30:00 AM",
"06-25-2021 08:45:00 AM",
"07-25-2021 08:45:00 AM",
"07-25-2021 09:00:00 AM",
"08-25-2021 08:45:00 AM",
"08-25-2021 09:00:00 AM",
"09-25-2021 09:15:00 AM",
"09-25-2021 09:30:00 AM"
)
b = c(4, 5, 8, NA, 4, 5, NA, 7, 7, 6)
c = c(6, 10, 8, NA, 8, 5, NA, 8, 7, 2)
d = c(1, 3 ,NA, 6, 4, 8, 2, 4, NA, 10)
df =
tibble(a, b, c, d)
df$a = as.POSIXlt(df$a, format = "%m-%d-%Y%H:%M:%S", tz = 'EST')
但我希望它看起来像这样
Name = c("b", "c", "d")
June = c(15, 45, 15 )
Jul = c(NA, 15, 15)
Aug = c(15, 15, NA)
Sept = c(45, 30, 15)
df_2 = tibble (Name,June,Jul,Aug,Sept)
当它是时间序列时,我不确定如何 sum
和 filter
。有人有什么建议吗?
根据描述 - 也许我们 replace
列 'b' 至 'd' 中小于或等于 5 的值将 NA
重塑为 'long' 使用 pivot_longer
格式,从 'a' 列获取 month
并使用 pivot_wider
重塑回 'wide'
library(dplyr)
library(tidyr)
df %>%
mutate(across(2:4, ~ replace(., . <=5, NA))) %>%
pivot_longer(cols = b:d, names_to = 'Name', values_drop_na = TRUE) %>%
mutate(a = format(a, '%b')) %>%
pivot_wider(names_from = a, values_from = value,
values_fill = 0, values_fn = list(value = function(x) length(x) * 15))
使用aggregate
、replace
小于或等于 5 的所有值加零并计算总和。剩下的正在成型。
aggregate(cbind(b, c, d) ~ a$mon, df, \(x) sum(replace(x, x <= 5, 0), na.rm=T)) |>
t() |> as.data.frame() |>
(\(x) setNames(x, month.abb[unlist(x[1, ])])[-1, ])() |>
(\(x) cbind(Name=rownames(x), x))() |> `rownames<-`(NULL) ## optional
# Name May Jun Jul Aug
# 1 b 0 0 7 6
# 2 c 16 8 8 0
# 3 d 0 8 0 10
数据
df <- structure(list(a = structure(list(sec = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), min = c(0L, 15L, 30L, 45L, 45L, 0L, 45L, 0L, 15L, 30L
), hour = c(8L, 8L, 8L, 8L, 8L, 9L, 8L, 9L, 9L, 9L), mday = c(25L,
25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L), mon = c(5L, 5L,
5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L), year = c(121L, 121L, 121L, 121L,
121L, 121L, 121L, 121L, 121L, 121L), wday = c(5L, 5L, 5L, 5L,
0L, 0L, 3L, 3L, 6L, 6L), yday = c(175L, 175L, 175L, 175L, 205L,
205L, 236L, 236L, 267L, 267L), isdst = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L), zone = c("EST", "EST", "EST", "EST", "EST",
"EST", "EST", "EST", "EST", "EST"), gmtoff = c(-18000L, -18000L,
-18000L, -18000L, -18000L, -18000L, -18000L, -18000L, -18000L,
-18000L)), class = c("POSIXlt", "POSIXt"), tzone = c("EST", "EST",
"EST")), b = c(4, 5, 8, NA, 4, 5, NA, 7, 7, 6), c = c(6, 10,
8, NA, 8, 5, NA, 8, 7, 2), d = c(1, 3, NA, 6, 4, 8, 2, 4, NA,
10)), row.names = c(NA, -10L), class = "data.frame")
# [1] "R version 4.1.2 (2021-11-01)"
我有一些时间序列数据,每 15 分钟给出一次值。对于这个数据集,我们可以假设值在 15 分钟内没有变化;因此,如果 8:15am 的值为 1,8:30am 的值为 2,那么我们假设 8:15-8:30 的值为 1,而 8:30 -8:45 值为 2。
现在我想计算值大于 5 时的每月总分钟数。
我的数据框看起来像这样(除了 15 列以上的值)。
# Create a, b, c, d variables
a <- c(
"06-25-2021 08:00:00 AM",
"06-25-2021 08:15:00 AM",
"06-25-2021 08:30:00 AM",
"06-25-2021 08:45:00 AM",
"07-25-2021 08:45:00 AM",
"07-25-2021 09:00:00 AM",
"08-25-2021 08:45:00 AM",
"08-25-2021 09:00:00 AM",
"09-25-2021 09:15:00 AM",
"09-25-2021 09:30:00 AM"
)
b = c(4, 5, 8, NA, 4, 5, NA, 7, 7, 6)
c = c(6, 10, 8, NA, 8, 5, NA, 8, 7, 2)
d = c(1, 3 ,NA, 6, 4, 8, 2, 4, NA, 10)
df =
tibble(a, b, c, d)
df$a = as.POSIXlt(df$a, format = "%m-%d-%Y%H:%M:%S", tz = 'EST')
但我希望它看起来像这样
Name = c("b", "c", "d")
June = c(15, 45, 15 )
Jul = c(NA, 15, 15)
Aug = c(15, 15, NA)
Sept = c(45, 30, 15)
df_2 = tibble (Name,June,Jul,Aug,Sept)
当它是时间序列时,我不确定如何 sum
和 filter
。有人有什么建议吗?
根据描述 - 也许我们 replace
列 'b' 至 'd' 中小于或等于 5 的值将 NA
重塑为 'long' 使用 pivot_longer
格式,从 'a' 列获取 month
并使用 pivot_wider
library(dplyr)
library(tidyr)
df %>%
mutate(across(2:4, ~ replace(., . <=5, NA))) %>%
pivot_longer(cols = b:d, names_to = 'Name', values_drop_na = TRUE) %>%
mutate(a = format(a, '%b')) %>%
pivot_wider(names_from = a, values_from = value,
values_fill = 0, values_fn = list(value = function(x) length(x) * 15))
使用aggregate
、replace
小于或等于 5 的所有值加零并计算总和。剩下的正在成型。
aggregate(cbind(b, c, d) ~ a$mon, df, \(x) sum(replace(x, x <= 5, 0), na.rm=T)) |>
t() |> as.data.frame() |>
(\(x) setNames(x, month.abb[unlist(x[1, ])])[-1, ])() |>
(\(x) cbind(Name=rownames(x), x))() |> `rownames<-`(NULL) ## optional
# Name May Jun Jul Aug
# 1 b 0 0 7 6
# 2 c 16 8 8 0
# 3 d 0 8 0 10
数据
df <- structure(list(a = structure(list(sec = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), min = c(0L, 15L, 30L, 45L, 45L, 0L, 45L, 0L, 15L, 30L
), hour = c(8L, 8L, 8L, 8L, 8L, 9L, 8L, 9L, 9L, 9L), mday = c(25L,
25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L), mon = c(5L, 5L,
5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L), year = c(121L, 121L, 121L, 121L,
121L, 121L, 121L, 121L, 121L, 121L), wday = c(5L, 5L, 5L, 5L,
0L, 0L, 3L, 3L, 6L, 6L), yday = c(175L, 175L, 175L, 175L, 205L,
205L, 236L, 236L, 267L, 267L), isdst = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L), zone = c("EST", "EST", "EST", "EST", "EST",
"EST", "EST", "EST", "EST", "EST"), gmtoff = c(-18000L, -18000L,
-18000L, -18000L, -18000L, -18000L, -18000L, -18000L, -18000L,
-18000L)), class = c("POSIXlt", "POSIXt"), tzone = c("EST", "EST",
"EST")), b = c(4, 5, 8, NA, 4, 5, NA, 7, 7, 6), c = c(6, 10,
8, NA, 8, 5, NA, 8, 7, 2), d = c(1, 3, NA, 6, 4, 8, 2, 4, NA,
10)), row.names = c(NA, -10L), class = "data.frame")
# [1] "R version 4.1.2 (2021-11-01)"