平滑分组比例图
smoothed grouped proportion plot
我有以下数据集:
set.seed(10)
start_date <- as.Date('2000-01-01')
end_date <- as.Date('2000-01-10')
Data <- data.frame(
id = rep((1:1000),10),
group = rep(c("A","B"), 25),
x = sample(1:100),
y = sample(c("1", "0"), 10, replace = TRUE),
date = as.Date(
sample(as.numeric(start_date):
as.numeric(end_date), 1000,
replace = T), origin = '2000-01-01'))
据此,我创建了以下情节:
Data %>% mutate(treated = factor(group)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
group_by(treated, date) %>% #group
summarise(prop = sum(y=="1")/n()) %>% #calculate proportion
ggplot()+ theme_classic() +
geom_line(aes(x = date, y = prop, color = treated)) +
geom_point(aes(x = date, y = prop, color = treated)) +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
不幸的是,情节很漂亮'jumpy',我想平滑它。我试过 geom_smooth()
但无法正常工作。关于平滑的其他问题对我没有帮助,因为它们错过了分组方面,因此具有不同的结构。但是,示例数据集实际上是更大数据集的一部分,因此我需要坚持使用该代码。
[编辑:我试过的 geom_smooth()
代码是 geom_smooth(method = 'auto', formula = y ~ x)
]
有人能指出我正确的方向吗?
非常感谢,祝一切顺利。
这就是你想要的平滑线吗?你用美学来称呼geom_smooth
,而不是结合geom_line
。您可以选择不同的平滑方法,但默认的 loess
低观察值通常是人们想要的。顺便说一句,我认为这不一定比 geom_line
版本好看,而且实际上可读性稍差。 geom_smooth
最适用于每个 x
有许多 y
观察结果,这使得模式难以看清,geom_line
适合 1-1。
编辑:在更仔细地查看了您的操作后,我添加了第二个图,它不直接计算 treatment-date 均值,而是直接使用 geom_smooth
。这使您可以获得更合理的置信区间,而不必像以前那样将其删除。
set.seed(10)
start_date <- as.Date('2000-01-01')
end_date <- as.Date('2000-01-10')
Data <- data.frame(
id = rep((1:1000),10),
group = rep(c("A","B"), 25),
x = sample(1:100),
y = sample(c("1", "0"), 10, replace = TRUE),
date = as.Date(
sample(as.numeric(start_date):
as.numeric(end_date), 1000,
replace = T), origin = '2000-01-01'))
library(tidyverse)
Data %>%
mutate(treated = factor(group)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
group_by(treated, date) %>% #group
summarise(prop = sum(y=="1")/n()) %>% #calculate proportion
ggplot() +
theme_classic() +
geom_smooth(aes(x = date, y = prop, color = treated), se = F) +
geom_point(aes(x = date, y = prop, color = treated)) +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Data %>%
mutate(treated = factor(group)) %>%
mutate(y = ifelse(y == "0", 0, 1)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
ggplot() +
theme_classic() +
geom_smooth(aes(x = date, y = y, color = treated), method = "loess") +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
由 reprex package (v0.2.0) 创建于 2018-03-27。
我有以下数据集:
set.seed(10)
start_date <- as.Date('2000-01-01')
end_date <- as.Date('2000-01-10')
Data <- data.frame(
id = rep((1:1000),10),
group = rep(c("A","B"), 25),
x = sample(1:100),
y = sample(c("1", "0"), 10, replace = TRUE),
date = as.Date(
sample(as.numeric(start_date):
as.numeric(end_date), 1000,
replace = T), origin = '2000-01-01'))
据此,我创建了以下情节:
Data %>% mutate(treated = factor(group)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
group_by(treated, date) %>% #group
summarise(prop = sum(y=="1")/n()) %>% #calculate proportion
ggplot()+ theme_classic() +
geom_line(aes(x = date, y = prop, color = treated)) +
geom_point(aes(x = date, y = prop, color = treated)) +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
不幸的是,情节很漂亮'jumpy',我想平滑它。我试过 geom_smooth()
但无法正常工作。关于平滑的其他问题对我没有帮助,因为它们错过了分组方面,因此具有不同的结构。但是,示例数据集实际上是更大数据集的一部分,因此我需要坚持使用该代码。
[编辑:我试过的 geom_smooth()
代码是 geom_smooth(method = 'auto', formula = y ~ x)
]
有人能指出我正确的方向吗? 非常感谢,祝一切顺利。
这就是你想要的平滑线吗?你用美学来称呼geom_smooth
,而不是结合geom_line
。您可以选择不同的平滑方法,但默认的 loess
低观察值通常是人们想要的。顺便说一句,我认为这不一定比 geom_line
版本好看,而且实际上可读性稍差。 geom_smooth
最适用于每个 x
有许多 y
观察结果,这使得模式难以看清,geom_line
适合 1-1。
编辑:在更仔细地查看了您的操作后,我添加了第二个图,它不直接计算 treatment-date 均值,而是直接使用 geom_smooth
。这使您可以获得更合理的置信区间,而不必像以前那样将其删除。
set.seed(10)
start_date <- as.Date('2000-01-01')
end_date <- as.Date('2000-01-10')
Data <- data.frame(
id = rep((1:1000),10),
group = rep(c("A","B"), 25),
x = sample(1:100),
y = sample(c("1", "0"), 10, replace = TRUE),
date = as.Date(
sample(as.numeric(start_date):
as.numeric(end_date), 1000,
replace = T), origin = '2000-01-01'))
library(tidyverse)
Data %>%
mutate(treated = factor(group)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
group_by(treated, date) %>% #group
summarise(prop = sum(y=="1")/n()) %>% #calculate proportion
ggplot() +
theme_classic() +
geom_smooth(aes(x = date, y = prop, color = treated), se = F) +
geom_point(aes(x = date, y = prop, color = treated)) +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Data %>%
mutate(treated = factor(group)) %>%
mutate(y = ifelse(y == "0", 0, 1)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
ggplot() +
theme_classic() +
geom_smooth(aes(x = date, y = y, color = treated), method = "loess") +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
由 reprex package (v0.2.0) 创建于 2018-03-27。