整理或熔化后轻松重新排序因子水平

Question

我正在尝试有效地绘制一系列双变量条形图。每个图都应显示按性别分布的一系列人口统计变量的案例频率。这段代码工作得很好但是在创建整理后的变量 variable 时，它的级别是不同人口统计变量的所有级别。由于它是一个新因子，R 以其自己的字母顺序对因子水平进行排序。但是，正如您从下方 'variable' 的因子水平和结果图中看到的那样，它们没有按有意义的顺序排列。即收入类别和教育水平乱序。

在我的真实数据集中，还有很多因素水平，因此 variable 的简单 releveling 是可能的，但不是真正可行的。我想到的一个选择是不 melt 变量进入 variable 而是尝试做一些 summarise_each() 的版本。但我无法让它工作。

感谢您的帮助。

#Age variable
age<-sample(c('18 to 24', '25 to 45', '45+'), size=100, replace=T)
#gender variable
gender<-sample(c('M', 'F'), size=100, replace=T)
#income variable
income<-sample(c(10,20,30,40,50,60,70,80,100,110), size=100, replace=T)
#education variable
education<-sample(c('High School', 'College', 'Elementary'), size=100, replace=T)
#tie together in df
df<-data.frame(age, gender, income, education)
#begin tidying
df %>% 
#tidy, not gender
gather(variable, value, -c(gender))%>%
#group by value, variable, then gender
group_by(value, variable, gender)  %>%
#summarise to obtain table cell frequencies
summarise(freq=n())%>%
#begin plotting, value (categories) as x-axis, frequency as y, gender as grouping variable, original variable as the facetting
ggplot(aes(x=value, y=freq, group=gender))+geom_bar(aes(fill=gender),  stat='identity', position='dodge')+facet_wrap(~variable, scales='free_x')

Answer 1

数据

df$education <- factor(df$education, c("Elementary", "High School", 
                        "College"))
ddf <- df %>% 
       gather(variable, value, -gender) %>%
       group_by(value, variable, gender)  %>%
       summarise(freq = n())

代码

lvl <- unlist(lapply(df[, -2], function(.) levels(as.factor(.))))
ddf$value <- factor(ddf$value, lvl)
ddf %>% ggplot(aes(x = value, y = freq, group = gender)) + 
        geom_bar(aes(fill = gender), stat = 'identity', 
                 position = 'dodge') + 
        facet_wrap(~variable, scales='free_x')

说明

gather 将 education、income 和 age 中的值转换为字符向量。 ggplot 然后使用这些值的规范顺序（即字母顺序）。如果您希望它们具有特定的顺序，您应该首先将列转换为一个因素，然后按照您喜欢的顺序分配级别（如您所提到的）。我只是对原始级别进行了排序（并默默地将数值 income 转换为一个因子 - 可能需要对您的代码进行一些调整）。但它表明，假设级别在原始数据集中的顺序正确，您不必自己对任何级别进行硬编码。

所以在你的实际情况下，你应该做的是：

将字符向量 value 转换为因子
按照您希望它们在 ggplot

情节

整理或熔化后轻松重新排序因子水平

Easily reorder factor levels after tidying or melting

r

ggplot2

tidyr