使用 dplyr 或聚合函数每 10 分钟计算一次几何平均值
Calculating geometric mean every 10 min using dplyr or aggregte function
我正在尝试每 10 分钟计算一次列的几何平均值。
我的示例数据是..
TimeDate diam ratio
2016-05-11 8:25 134.491 1.83074
2016-05-11 8:25 117.777 1.34712
2016-05-11 8:25 104.27 0.927635
2016-05-11 8:25 204.085 1.43079
2016-05-11 8:25 96.8011 0.991716
2016-05-11 8:25 119.152 1.09884
2016-05-11 8:25 113.871 0.932493
2016-05-11 8:26 150.468 0.710525
2016-05-11 8:26 116.576 1.11207
2016-05-11 8:26 192.257 1.61558
2016-05-11 8:26 128.071 0.756608
2016-05-11 8:26 177.667 0.73309
2016-05-11 8:27 97.7377 0.862858
2016-05-11 8:27 98.3195 1.00681
2016-05-11 8:27 91.3603 0.95051
2016-05-11 8:27 152.95 0.842145
2016-05-11 8:27 133.125 1.28365
2016-05-11 8:27 95.2516 0.573588
我已经尝试使用 dplyr 函数,但下面的代码不会每 10 分钟产生一次值,而是一个几何平均值和一个几何 sd 值。
mydata$TimeDate <- as.POSIXct(strptime(mydata$TimeDate, format = "%Y-%m-%d %H:%M","GMT"))
mydata %>%
group_by(by10 = cut(TimeDate, breaks="10 min")) %>%
summarize(Geo.Mean=exp(mean(log(diam))),
Geo.SD=exp(sd(log(diam))))
数据格式本身是可以的,因为下面的聚合函数很好,尽管它不会创建几何平均值。
aggregate(mydata["diam"],
list(TimeDate=cut(mydata$TimeDate, "10 mins")),
median, na.rm=T)
一个选项是使用 lubridate::floor_date
函数全天候每 10 分钟创建一个组。 20-30 分钟之间的所有数据将被分组为第 20 分钟,依此类推。
library(dplyr)
library(lubridate)
mydata %>% mutate(TimeDate = as.POSIXct(TimeDate, format = "%Y-%m-%d %H:%M")) %>%
group_by(Diff_10 = floor_date(TimeDate, "10minute")) %>%
summarise(Geo.Mean=exp(mean(log(diam))),
Geo.SD=exp(sd(log(diam))))
# # A tibble: 1 x 3
# Diff_10 Geo.Mean Geo.SD
# <dttm> <dbl> <dbl>
# 1 2016-05-11 08:20:00 125 1.28
#Result with modified data
# # A tibble: 6 x 3
# Diff_10 Geo.Mean Geo.SD
# <dttm> <dbl> <dbl>
# 1 2016-05-11 08:20:00 118 1.14
# 2 2016-05-11 08:30:00 141 1.69
# 3 2016-05-11 08:40:00 127 1.16
# 4 2016-05-11 08:50:00 150 1.28
# 5 2016-05-11 09:10:00 98.0 1.00
# 6 2016-05-11 09:20:00 115 1.29
cut
可用于从开始时间起每 10 分钟对要分组的数据进行分组。在 OP
中,组将是 2016-05-11 08:25
、2016-05-11 08:35
等。
修改OP的数据:
mydata <- read.table(text =
"TimeDate diam ratio
'2016-05-11 8:25' 134.491 1.83074
'2016-05-11 8:25' 117.777 1.34712
'2016-05-11 8:25' 104.27 0.927635
'2016-05-11 8:35' 204.085 1.43079
'2016-05-11 8:35' 96.8011 0.991716
'2016-05-11 8:42' 119.152 1.09884
'2016-05-11 8:45' 113.871 0.932493
'2016-05-11 8:46' 150.468 0.710525
'2016-05-11 8:56' 116.576 1.11207
'2016-05-11 8:56' 192.257 1.61558
'2016-05-11 8:56' 128.071 0.756608
'2016-05-11 8:59' 177.667 0.73309
'2016-05-11 9:17' 97.7377 0.862858
'2016-05-11 9:17' 98.3195 1.00681
'2016-05-11 9:27' 91.3603 0.95051
'2016-05-11 9:27' 152.95 0.842145
'2016-05-11 9:27' 133.125 1.28365
'2016-05-11 9:27' 95.2516 0.573588",
header = TRUE, stringsAsFactors = FALSE)
我正在尝试每 10 分钟计算一次列的几何平均值。
我的示例数据是..
TimeDate diam ratio
2016-05-11 8:25 134.491 1.83074
2016-05-11 8:25 117.777 1.34712
2016-05-11 8:25 104.27 0.927635
2016-05-11 8:25 204.085 1.43079
2016-05-11 8:25 96.8011 0.991716
2016-05-11 8:25 119.152 1.09884
2016-05-11 8:25 113.871 0.932493
2016-05-11 8:26 150.468 0.710525
2016-05-11 8:26 116.576 1.11207
2016-05-11 8:26 192.257 1.61558
2016-05-11 8:26 128.071 0.756608
2016-05-11 8:26 177.667 0.73309
2016-05-11 8:27 97.7377 0.862858
2016-05-11 8:27 98.3195 1.00681
2016-05-11 8:27 91.3603 0.95051
2016-05-11 8:27 152.95 0.842145
2016-05-11 8:27 133.125 1.28365
2016-05-11 8:27 95.2516 0.573588
我已经尝试使用 dplyr 函数,但下面的代码不会每 10 分钟产生一次值,而是一个几何平均值和一个几何 sd 值。
mydata$TimeDate <- as.POSIXct(strptime(mydata$TimeDate, format = "%Y-%m-%d %H:%M","GMT"))
mydata %>%
group_by(by10 = cut(TimeDate, breaks="10 min")) %>%
summarize(Geo.Mean=exp(mean(log(diam))),
Geo.SD=exp(sd(log(diam))))
数据格式本身是可以的,因为下面的聚合函数很好,尽管它不会创建几何平均值。
aggregate(mydata["diam"],
list(TimeDate=cut(mydata$TimeDate, "10 mins")),
median, na.rm=T)
一个选项是使用 lubridate::floor_date
函数全天候每 10 分钟创建一个组。 20-30 分钟之间的所有数据将被分组为第 20 分钟,依此类推。
library(dplyr)
library(lubridate)
mydata %>% mutate(TimeDate = as.POSIXct(TimeDate, format = "%Y-%m-%d %H:%M")) %>%
group_by(Diff_10 = floor_date(TimeDate, "10minute")) %>%
summarise(Geo.Mean=exp(mean(log(diam))),
Geo.SD=exp(sd(log(diam))))
# # A tibble: 1 x 3
# Diff_10 Geo.Mean Geo.SD
# <dttm> <dbl> <dbl>
# 1 2016-05-11 08:20:00 125 1.28
#Result with modified data
# # A tibble: 6 x 3
# Diff_10 Geo.Mean Geo.SD
# <dttm> <dbl> <dbl>
# 1 2016-05-11 08:20:00 118 1.14
# 2 2016-05-11 08:30:00 141 1.69
# 3 2016-05-11 08:40:00 127 1.16
# 4 2016-05-11 08:50:00 150 1.28
# 5 2016-05-11 09:10:00 98.0 1.00
# 6 2016-05-11 09:20:00 115 1.29
cut
可用于从开始时间起每 10 分钟对要分组的数据进行分组。在 OP
中,组将是 2016-05-11 08:25
、2016-05-11 08:35
等。
修改OP的数据:
mydata <- read.table(text =
"TimeDate diam ratio
'2016-05-11 8:25' 134.491 1.83074
'2016-05-11 8:25' 117.777 1.34712
'2016-05-11 8:25' 104.27 0.927635
'2016-05-11 8:35' 204.085 1.43079
'2016-05-11 8:35' 96.8011 0.991716
'2016-05-11 8:42' 119.152 1.09884
'2016-05-11 8:45' 113.871 0.932493
'2016-05-11 8:46' 150.468 0.710525
'2016-05-11 8:56' 116.576 1.11207
'2016-05-11 8:56' 192.257 1.61558
'2016-05-11 8:56' 128.071 0.756608
'2016-05-11 8:59' 177.667 0.73309
'2016-05-11 9:17' 97.7377 0.862858
'2016-05-11 9:17' 98.3195 1.00681
'2016-05-11 9:27' 91.3603 0.95051
'2016-05-11 9:27' 152.95 0.842145
'2016-05-11 9:27' 133.125 1.28365
'2016-05-11 9:27' 95.2516 0.573588",
header = TRUE, stringsAsFactors = FALSE)