stat_sum 和 stat_identity 给出奇怪的结果
stat_sum and stat_identity give weird results
我有以下代码,包括随机生成的演示数据:
n <- 10
group <- rep(1:4, n)
mass.means <- c(10, 20, 15, 30)
mass.sigma <- 4
score.means <- c(5, 5, 7, 4)
score.sigma <- 3
mass <- as.vector(model.matrix(~0+factor(group)) %*% mass.means) +
rnorm(n*4, 0, mass.sigma)
score <- as.vector(model.matrix(~0+factor(group)) %*% score.means) +
rnorm(n*4, 0, score.sigma)
data <- data.frame(id = 1:(n*4), group, mass, score)
head(data)
给出:
id group mass score
1 1 1 12.643603 5.015746
2 2 2 21.458750 5.590619
3 3 3 15.757938 8.777318
4 4 4 32.658551 6.365853
5 5 1 6.636169 5.885747
6 6 2 13.467437 6.390785
然后我想在条形图中绘制 "score" 的总和,按 "group" 分组:
plot <- ggplot(data = data, aes(x = group, y = score)) +
geom_bar(stat="sum")
plot
这给了我:
奇怪的是,使用 stat_identity
似乎给出了我正在寻找的结果:
plot <- ggplot(data = data, aes(x = group, y = score)) +
geom_bar(stat="identity")
plot
这是一个错误吗?在 R
上使用 ggplot2 1.0.0
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 1.2
year 2014
month 10
day 31
svn rev 66913
language R
version.string R version 3.1.2 (2014-10-31)
nickname Pumpkin Helmet
或者我做错了什么?
plot <- ggplot(data = data, aes(x = group, y = score)) +
stat_summary(fun.y = "sum", geom = "bar", position = "identity")
plot
aggregate(score ~ group, data=data, FUN=sum)
# group score
#1 1 51.71279
#2 2 58.94611
#3 3 67.52100
#4 4 39.24484
编辑:
stat_sum
不起作用,因为它不只是 return 总和。它 return 是 "number of observations at position" 和 "percent of points in that panel at that position"。它是为不同的目的而设计的。文档说“对于在散点图上过度绘制很有用。”
stat_identity
(有点)起作用是因为 geom_bar
默认情况下会堆叠条形图。与我的解决方案形成鲜明对比的是,您有很多条彼此重叠,每组只给您一个条。看看这个:
plot <- ggplot(data = data, aes(x = group, y = score)) +
geom_bar(stat="identity", color = "red")
plot
还要考虑警告:
Warning message:
Stacking not well defined when ymin != 0
我有以下代码,包括随机生成的演示数据:
n <- 10
group <- rep(1:4, n)
mass.means <- c(10, 20, 15, 30)
mass.sigma <- 4
score.means <- c(5, 5, 7, 4)
score.sigma <- 3
mass <- as.vector(model.matrix(~0+factor(group)) %*% mass.means) +
rnorm(n*4, 0, mass.sigma)
score <- as.vector(model.matrix(~0+factor(group)) %*% score.means) +
rnorm(n*4, 0, score.sigma)
data <- data.frame(id = 1:(n*4), group, mass, score)
head(data)
给出:
id group mass score
1 1 1 12.643603 5.015746
2 2 2 21.458750 5.590619
3 3 3 15.757938 8.777318
4 4 4 32.658551 6.365853
5 5 1 6.636169 5.885747
6 6 2 13.467437 6.390785
然后我想在条形图中绘制 "score" 的总和,按 "group" 分组:
plot <- ggplot(data = data, aes(x = group, y = score)) +
geom_bar(stat="sum")
plot
这给了我:
奇怪的是,使用 stat_identity
似乎给出了我正在寻找的结果:
plot <- ggplot(data = data, aes(x = group, y = score)) +
geom_bar(stat="identity")
plot
这是一个错误吗?在 R
上使用 ggplot2 1.0.0platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 1.2
year 2014
month 10
day 31
svn rev 66913
language R
version.string R version 3.1.2 (2014-10-31)
nickname Pumpkin Helmet
或者我做错了什么?
plot <- ggplot(data = data, aes(x = group, y = score)) +
stat_summary(fun.y = "sum", geom = "bar", position = "identity")
plot
aggregate(score ~ group, data=data, FUN=sum)
# group score
#1 1 51.71279
#2 2 58.94611
#3 3 67.52100
#4 4 39.24484
编辑:
stat_sum
不起作用,因为它不只是 return 总和。它 return 是 "number of observations at position" 和 "percent of points in that panel at that position"。它是为不同的目的而设计的。文档说“对于在散点图上过度绘制很有用。”
stat_identity
(有点)起作用是因为 geom_bar
默认情况下会堆叠条形图。与我的解决方案形成鲜明对比的是,您有很多条彼此重叠,每组只给您一个条。看看这个:
plot <- ggplot(data = data, aes(x = group, y = score)) +
geom_bar(stat="identity", color = "red")
plot
还要考虑警告:
Warning message:
Stacking not well defined when ymin != 0