如何更改 ggplot (geom_bin2d) 中 bin 的颜色以反映该区域的密度与整个数据集的平均密度之间的差异?
How to change the colour of bins in ggplot (geom_bin2d) to reflect difference between density in that area and the average density across a dataset?
假设我有一些数据看起来有点像这样
library(ggplot2)
library(dplyr)
employee <- employee <- c('John','Dave','Paul','Ringo','George','Tom','Jim','Harry','Jamie','Adrian')
quality <- c('good', 'bad')
x = runif(4000,0,100)
y = runif(4000,0,100)
employ.data <- data.frame(employee, quality, x, y)
我正在处理 geom_bin2d 看起来像这样的情节
ggplot(dat, aes(x, y)) +
geom_bin2d(binwidth = c(20, 20)) +
scale_fill_gradient2(low="darkred", high = "darkgreen")
plot
如何更改 bin 的颜色以反映 x/y 点与整个数据集中该区域的总体平均值相比 'bad' 的百分比?即,如果左下方箱子中 'bad' 点的平均值是 x 个数,而约翰在该区域的平均值是 y 个较低的数字,我怎样才能使箱子颜色变暗以表明他的计数较低?
我认为这可以用来创建平均值:
df2 <- employ.data
df2$xbin <- cut(df2$x, breaks = seq(0, 100, by = 20))
df2$ybin <- cut(df2$y, breaks = seq(0, 100, by = 20))
df2 <- df2 %>% group_by(xbin, ybin) %>% mutate(ave_pct = mean(quality == "bad"))
df2 <- df2 %>% group_by(employee, xbin, ybin) %>% mutate(person_pct = mean(quality == "bad"))
但后来我不知道如何绘制它。
因此,如果我对您的理解是正确的,您希望根据每个垃圾箱中不良员工的百分比与不良员工的总体百分比的比较来对垃圾箱进行着色。为此,我将计算方式更改为:
df <- employ.data %>%
mutate(xbin = cut(x, breaks = seq(0, 100, by = 20)),
ybin = cut(y, breaks = seq(0, 100, by = 20)),
overall_ave = mean(quality == "bad")) %>%
group_by(xbin, ybin) %>%
mutate(bin_ave = mean(quality == "bad")) %>%
ungroup() %>%
mutate(bin_quality = bin_ave - overall_ave)
这将创建箱子,然后找到 "bad" 优质员工的总体百分比。然后它按各自的 bin 分组,并找到每个 bin 的 "bad" 员工的百分比。然后它将每个 bin 平均值与总体平均值进行比较。对于 "good" 员工百分比较高的垃圾箱,这会为 bin_quality
提供正值,而对于 "bad" 员工百分比较高的垃圾箱,这会给出负数。
然后,您可以通过向 ggplot
内的 aes()
调用添加 fill = bin_quality
和 group = bin_quality
参数来绘制它。您还需要将 aes(group = bin_quality)
添加到 geom_bin2d
调用中。看起来像这样:
ggplot(df, aes(x, y, fill = bin_quality, group = bin_quality)) +
geom_bin2d(aes(group = bin_quality), binwidth = c(20, 20)) +
scale_fill_gradient2(low="darkred", high = "darkgreen")
这给你这张图:
假设我有一些数据看起来有点像这样
library(ggplot2)
library(dplyr)
employee <- employee <- c('John','Dave','Paul','Ringo','George','Tom','Jim','Harry','Jamie','Adrian')
quality <- c('good', 'bad')
x = runif(4000,0,100)
y = runif(4000,0,100)
employ.data <- data.frame(employee, quality, x, y)
我正在处理 geom_bin2d 看起来像这样的情节
ggplot(dat, aes(x, y)) +
geom_bin2d(binwidth = c(20, 20)) +
scale_fill_gradient2(low="darkred", high = "darkgreen")
plot
如何更改 bin 的颜色以反映 x/y 点与整个数据集中该区域的总体平均值相比 'bad' 的百分比?即,如果左下方箱子中 'bad' 点的平均值是 x 个数,而约翰在该区域的平均值是 y 个较低的数字,我怎样才能使箱子颜色变暗以表明他的计数较低?
我认为这可以用来创建平均值:
df2 <- employ.data
df2$xbin <- cut(df2$x, breaks = seq(0, 100, by = 20))
df2$ybin <- cut(df2$y, breaks = seq(0, 100, by = 20))
df2 <- df2 %>% group_by(xbin, ybin) %>% mutate(ave_pct = mean(quality == "bad"))
df2 <- df2 %>% group_by(employee, xbin, ybin) %>% mutate(person_pct = mean(quality == "bad"))
但后来我不知道如何绘制它。
因此,如果我对您的理解是正确的,您希望根据每个垃圾箱中不良员工的百分比与不良员工的总体百分比的比较来对垃圾箱进行着色。为此,我将计算方式更改为:
df <- employ.data %>%
mutate(xbin = cut(x, breaks = seq(0, 100, by = 20)),
ybin = cut(y, breaks = seq(0, 100, by = 20)),
overall_ave = mean(quality == "bad")) %>%
group_by(xbin, ybin) %>%
mutate(bin_ave = mean(quality == "bad")) %>%
ungroup() %>%
mutate(bin_quality = bin_ave - overall_ave)
这将创建箱子,然后找到 "bad" 优质员工的总体百分比。然后它按各自的 bin 分组,并找到每个 bin 的 "bad" 员工的百分比。然后它将每个 bin 平均值与总体平均值进行比较。对于 "good" 员工百分比较高的垃圾箱,这会为 bin_quality
提供正值,而对于 "bad" 员工百分比较高的垃圾箱,这会给出负数。
然后,您可以通过向 ggplot
内的 aes()
调用添加 fill = bin_quality
和 group = bin_quality
参数来绘制它。您还需要将 aes(group = bin_quality)
添加到 geom_bin2d
调用中。看起来像这样:
ggplot(df, aes(x, y, fill = bin_quality, group = bin_quality)) +
geom_bin2d(aes(group = bin_quality), binwidth = c(20, 20)) +
scale_fill_gradient2(low="darkred", high = "darkgreen")
这给你这张图: