可视化不同子组的分布
Visualising the distribution for different subgroups
我正在使用“d.pizza”数据。有一个名为“delivery_min”的变量,它是交货时间(以分钟为单位),还有一个名为“区域”的变量,它可以是三个区域之一(卡姆登、威斯敏斯特和布伦特)。
我想绘制一个密度图来可视化这三个区域的交货时间分布。
我试过了
plot.ecdf(pizza_d$delivery_min)
此代码有效,但我如何针对每个区域执行此代码?
head(d.pizza)=
index date week weekday area count rabate price operator driver delivery_min
1 1 1 01.03.2014 9 6 Camden 5 TRUE 65.655 Rhonda Taylor 20.0
2 2 2 01.03.2014 9 6 Westminster 2 FALSE 26.980 Rhonda Butcher 19.6
3 3 3 01.03.2014 9 6 Westminster 3 FALSE 40.970 Allanah Butcher 17.8
4 4 4 01.03.2014 9 6 Brent 2 FALSE 25.980 Allanah Taylor 37.3
5 5 5 01.03.2014 9 6 Brent 5 TRUE 57.555 Rhonda Carter 21.8
6 6 6 01.03.2014 9 6 Camden 1 FALSE 13.990 Allanah Taylor 48.7
temperature wine_ordered wine_delivered wrongpizza quality
1 53.0 0 0 FALSE medium
2 56.4 0 0 FALSE high
3 36.5 0 0 FALSE <NA>
4 NA 0 0 FALSE <NA>
5 50.0 0 0 FALSE medium
6 27.0 0 0 FALSE low
library(DescTools)
data(d.pizza)
summary(d.pizza$delivery_min)
plot(NULL,ylab='',xlab='', xlim=c(5,66), ylim=0:1)
for(A in 1:3) {
plot.ecdf(d.pizza$delivery_min[d.pizza$area == levels(d.pizza$area)[A]],
pch=20, col=A+1, add=T)
}
legend("bottomright", legend=levels(d.pizza$area),
bty='n', pch=20, col=2:4)
你可以这样做:
library(DescTools)
data(d.pizza)
plot.ecdf(subset(d.pizza, area == "Camden")$delivery_min,
col = "red", main = "ECDF for pizza deliveries")
plot.ecdf(subset(d.pizza, area == "Westminster")$delivery_min,
add = TRUE, col = "blue")
plot.ecdf(subset(d.pizza, area == "Brent")$delivery_min,
add = TRUE, col = "green")
我推荐使用 ggplot2 库在 R 中进行数据可视化。下面是一些使用 ggplot2 的代码,可以创建覆盖三个组的密度图:
library(ggplot2)
# make example dataframe
d.pizza <- data.frame(delivery_min = rnorm(n=30), area = rep(c("Camden", "Westminster", "Brent"), 10))
# plot data in ggplot2
ggplot(d.pizza, aes(x = delivery_min, fill = area, color = area)) + geom_density(alpha = 0.5)
如果你想要直方图,那也可以做到:
ggplot(d.pizza, aes(x = delivery_min, fill = area, color = area)) + geom_histogram(alpha = 0.5, position = 'identity')
我正在使用“d.pizza”数据。有一个名为“delivery_min”的变量,它是交货时间(以分钟为单位),还有一个名为“区域”的变量,它可以是三个区域之一(卡姆登、威斯敏斯特和布伦特)。 我想绘制一个密度图来可视化这三个区域的交货时间分布。
我试过了
plot.ecdf(pizza_d$delivery_min)
此代码有效,但我如何针对每个区域执行此代码?
head(d.pizza)=
index date week weekday area count rabate price operator driver delivery_min
1 1 1 01.03.2014 9 6 Camden 5 TRUE 65.655 Rhonda Taylor 20.0
2 2 2 01.03.2014 9 6 Westminster 2 FALSE 26.980 Rhonda Butcher 19.6
3 3 3 01.03.2014 9 6 Westminster 3 FALSE 40.970 Allanah Butcher 17.8
4 4 4 01.03.2014 9 6 Brent 2 FALSE 25.980 Allanah Taylor 37.3
5 5 5 01.03.2014 9 6 Brent 5 TRUE 57.555 Rhonda Carter 21.8
6 6 6 01.03.2014 9 6 Camden 1 FALSE 13.990 Allanah Taylor 48.7
temperature wine_ordered wine_delivered wrongpizza quality
1 53.0 0 0 FALSE medium
2 56.4 0 0 FALSE high
3 36.5 0 0 FALSE <NA>
4 NA 0 0 FALSE <NA>
5 50.0 0 0 FALSE medium
6 27.0 0 0 FALSE low
library(DescTools)
data(d.pizza)
summary(d.pizza$delivery_min)
plot(NULL,ylab='',xlab='', xlim=c(5,66), ylim=0:1)
for(A in 1:3) {
plot.ecdf(d.pizza$delivery_min[d.pizza$area == levels(d.pizza$area)[A]],
pch=20, col=A+1, add=T)
}
legend("bottomright", legend=levels(d.pizza$area),
bty='n', pch=20, col=2:4)
你可以这样做:
library(DescTools)
data(d.pizza)
plot.ecdf(subset(d.pizza, area == "Camden")$delivery_min,
col = "red", main = "ECDF for pizza deliveries")
plot.ecdf(subset(d.pizza, area == "Westminster")$delivery_min,
add = TRUE, col = "blue")
plot.ecdf(subset(d.pizza, area == "Brent")$delivery_min,
add = TRUE, col = "green")
我推荐使用 ggplot2 库在 R 中进行数据可视化。下面是一些使用 ggplot2 的代码,可以创建覆盖三个组的密度图:
library(ggplot2)
# make example dataframe
d.pizza <- data.frame(delivery_min = rnorm(n=30), area = rep(c("Camden", "Westminster", "Brent"), 10))
# plot data in ggplot2
ggplot(d.pizza, aes(x = delivery_min, fill = area, color = area)) + geom_density(alpha = 0.5)
如果你想要直方图,那也可以做到:
ggplot(d.pizza, aes(x = delivery_min, fill = area, color = area)) + geom_histogram(alpha = 0.5, position = 'identity')