如何将置信区间添加到圆形直方图(von Mises 分布)
How to add a confidence interval to a Circular Histogram (von Mises distribution)
我有时间数据,我想绘制 24 小时制每小时的频率。
数据转换为circular
,'periodic mean'mu
和'concentration'kappa
的估计值用mle.vonmises()
计算。
图表是使用 ggplot2
、geom_hist()
和 coord_polar()
生成的。通过简单调用 geom_vline()
.
在图上绘制周期均值
问题
我想在均值附近绘制一个 95% 的置信区间。然后,我想直观地检查给定的时间戳(例如“22:00:00”)是否在 CI 内。
如何使用 von mises 分布和 ggplot2 执行此操作?
下面的代码显示了我走了多远。
数据
timestamps <- c("08:43:48", "09:17:52", "12:56:22", "12:27:32", "10:59:23",
"07:22:45", "11:13:59", "10:13:26", "10:07:01", "06:09:56",
"12:43:17", "07:07:35", "09:36:44", "10:45:00", "08:27:36",
"07:55:35", "11:32:56", "13:18:35", "11:09:51", "09:46:33",
"06:59:12", "10:19:36", "09:39:47", "09:39:46", "18:23:54")
代码
library(lubridate)
library(circular)
library(ggplot2)
## Convert from char to hours
timestamps_hrs <- as.numeric(hms(timestamps)) / 3600
## Convert to class circular
timestamps_hrs_circ <- circular(timestamps_hrs, units = "hours", template = "clock24")
## Estimate the periodic mean and the concentration
## from the von Mises distribution
estimates <- mle.vonmises(timestamps_hrs_circ)
periodic_mean <- estimates$mu %% 24
concentration <- estimates$kappa
## Clock plot // Circular Histogram
clock01 <- ggplot(data.frame(timestamps_hrs_circ), aes(x = timestamps_hrs_circ)) +
geom_histogram(breaks = seq(0, 24), colour = "blue", fill = "lightblue") +
coord_polar() +
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), minor_breaks = NULL) +
theme_light()
clock01
## Add the periodic_mean
clock01 +
geom_vline(xintercept = as.numeric(periodic_mean), color = "red", linetype = 3, size = 1.25)
这会产生下图:
我想我找到了一个近似解。正如我们知道参数 mu
和 kappa
(分别是周期均值和浓度),我们知道分布。反过来,这意味着我们知道给定时间戳的密度,我们可以计算 95% 置信水平的截止值。
一旦我们有了它,我们就可以为一天中的每一分钟生成时间戳。我们根据需要转换时间戳,计算密度,并与截止值进行比较。
这样我们就可以在 1 分钟的水平上知道我们是否处于置信区间内。
代码
(假设题中代码已经运行)
quantile <- qvonmises((1 - 0.95)/2, mu = periodic_mean, kappa = concentration)
cutoff <- dvonmises(quantile, mu = periodic_mean, kappa = concentration)
## generate a timestamp for every minute in a day
## then the transformations needed
ts_1min <- format(seq.POSIXt(as.POSIXct(Sys.Date()),
as.POSIXct(Sys.Date()+1),
by = "1 min"),
"%H:%M:%S", tz = "GMT")
ts_1min_hrs <- as.numeric(hms(ts_1min)) / 3600
ts_1min_hrs_circ <- circular(ts_1min_hrs, units = "hours", template = "clock24")
## generate densities to compare with the cutoff
dens_1min <- dvonmises(ts_1min_hrs_circ, mu = periodic_mean, kappa = concentration)
## compare: vector of FALSE/TRUE
feat_1min <- dens_1min >= cutoff
df_1min_feat <- data.frame(ts = ts_1min_hrs_circ,
feature = feat_1min)
## get the min and max time of the CI
CI <- df_1min_feat %>%
filter(feature == TRUE) %>%
summarise(min = min(ts), max= max(ts))
CI
# min max
# 5.283333 14.91667
有了上面的信息,再利用geom_rect()
,我们就可以得到我们想要的了:
ggplot(data.frame(timestamps_hrs_circ), aes(x = timestamps_hrs_circ)) +
coord_polar() +
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), minor_breaks = NULL) +
geom_vline(xintercept = as.numeric(CI), color = "darkgreen", linetype = 1, size = 1.5) +
geom_rect(xmin = CI$min, xmax = CI$max, ymin = 0, ymax = 5, alpha = .5, fill = "lightgreen") +
ggtitle(label = "Circular Histogram", subtitle = "periodic mean in red,\n95%-CI in green" ) +
geom_histogram(breaks = seq(0, 24), colour = "blue", fill = "lightblue") +
geom_vline(xintercept = as.numeric(periodic_mean), color = "red", linetype = 2, size = 1.5) +
theme_light()
产生下图:
我希望有人也能从中受益。
我有时间数据,我想绘制 24 小时制每小时的频率。
数据转换为circular
,'periodic mean'mu
和'concentration'kappa
的估计值用mle.vonmises()
计算。
图表是使用 ggplot2
、geom_hist()
和 coord_polar()
生成的。通过简单调用 geom_vline()
.
问题
我想在均值附近绘制一个 95% 的置信区间。然后,我想直观地检查给定的时间戳(例如“22:00:00”)是否在 CI 内。 如何使用 von mises 分布和 ggplot2 执行此操作?
下面的代码显示了我走了多远。
数据
timestamps <- c("08:43:48", "09:17:52", "12:56:22", "12:27:32", "10:59:23",
"07:22:45", "11:13:59", "10:13:26", "10:07:01", "06:09:56",
"12:43:17", "07:07:35", "09:36:44", "10:45:00", "08:27:36",
"07:55:35", "11:32:56", "13:18:35", "11:09:51", "09:46:33",
"06:59:12", "10:19:36", "09:39:47", "09:39:46", "18:23:54")
代码
library(lubridate)
library(circular)
library(ggplot2)
## Convert from char to hours
timestamps_hrs <- as.numeric(hms(timestamps)) / 3600
## Convert to class circular
timestamps_hrs_circ <- circular(timestamps_hrs, units = "hours", template = "clock24")
## Estimate the periodic mean and the concentration
## from the von Mises distribution
estimates <- mle.vonmises(timestamps_hrs_circ)
periodic_mean <- estimates$mu %% 24
concentration <- estimates$kappa
## Clock plot // Circular Histogram
clock01 <- ggplot(data.frame(timestamps_hrs_circ), aes(x = timestamps_hrs_circ)) +
geom_histogram(breaks = seq(0, 24), colour = "blue", fill = "lightblue") +
coord_polar() +
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), minor_breaks = NULL) +
theme_light()
clock01
## Add the periodic_mean
clock01 +
geom_vline(xintercept = as.numeric(periodic_mean), color = "red", linetype = 3, size = 1.25)
这会产生下图:
我想我找到了一个近似解。正如我们知道参数 mu
和 kappa
(分别是周期均值和浓度),我们知道分布。反过来,这意味着我们知道给定时间戳的密度,我们可以计算 95% 置信水平的截止值。
一旦我们有了它,我们就可以为一天中的每一分钟生成时间戳。我们根据需要转换时间戳,计算密度,并与截止值进行比较。
这样我们就可以在 1 分钟的水平上知道我们是否处于置信区间内。
代码
(假设题中代码已经运行)
quantile <- qvonmises((1 - 0.95)/2, mu = periodic_mean, kappa = concentration)
cutoff <- dvonmises(quantile, mu = periodic_mean, kappa = concentration)
## generate a timestamp for every minute in a day
## then the transformations needed
ts_1min <- format(seq.POSIXt(as.POSIXct(Sys.Date()),
as.POSIXct(Sys.Date()+1),
by = "1 min"),
"%H:%M:%S", tz = "GMT")
ts_1min_hrs <- as.numeric(hms(ts_1min)) / 3600
ts_1min_hrs_circ <- circular(ts_1min_hrs, units = "hours", template = "clock24")
## generate densities to compare with the cutoff
dens_1min <- dvonmises(ts_1min_hrs_circ, mu = periodic_mean, kappa = concentration)
## compare: vector of FALSE/TRUE
feat_1min <- dens_1min >= cutoff
df_1min_feat <- data.frame(ts = ts_1min_hrs_circ,
feature = feat_1min)
## get the min and max time of the CI
CI <- df_1min_feat %>%
filter(feature == TRUE) %>%
summarise(min = min(ts), max= max(ts))
CI
# min max
# 5.283333 14.91667
有了上面的信息,再利用geom_rect()
,我们就可以得到我们想要的了:
ggplot(data.frame(timestamps_hrs_circ), aes(x = timestamps_hrs_circ)) +
coord_polar() +
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), minor_breaks = NULL) +
geom_vline(xintercept = as.numeric(CI), color = "darkgreen", linetype = 1, size = 1.5) +
geom_rect(xmin = CI$min, xmax = CI$max, ymin = 0, ymax = 5, alpha = .5, fill = "lightgreen") +
ggtitle(label = "Circular Histogram", subtitle = "periodic mean in red,\n95%-CI in green" ) +
geom_histogram(breaks = seq(0, 24), colour = "blue", fill = "lightblue") +
geom_vline(xintercept = as.numeric(periodic_mean), color = "red", linetype = 2, size = 1.5) +
theme_light()
产生下图:
我希望有人也能从中受益。