按模式重新排序因子水平
Reorder factor levels by pattern
我有一个因素可以识别调查数据集中的阶层。我想对因子重新排序,使某些字符模式出现在其他字符模式之前。
例如,我有这个表示性别、年龄和教育程度的混淆因素:
my_factor <- factor(levels=c(1:8),
labels=c("Male-18_34-HS","Female-35_49-HS",
"Male-18_34-CG", "Female-18_34-CG",
"Male-35_49-HS", "Male-35_49-CG",
"Female-18_34-HS", "Female-35_49-CG"),
ordered=TRUE)
我希望首先对所有女性类别进行排序,然后按正确顺序排列年龄类别,然后按正确顺序排列教育类别。我可以通过 forcats::fct_relevel
:
完成大部分工作
forcats::fct_relevel(my_factor, sort)
ordered(0)
8 Levels: Female-18_34-CG < Female-18_34-HS < Female-35_49-CG < Female-35_49-HS < Male-18_34-CG < Male-18_34-HS < ... < Male-35_49-HS
但是教育分类顺序错了。有没有办法确保“HS”在“CG”之前,但保持性别和年龄组的顺序不变?
dft<-c("Male-18_34-HS","Female-35_49-HS", "Male-18_34-CG", "Female-18_34-CG", "Male-35_49-HS", "Male-35_49-CG", "Female-18_34-HS", "Female-35_49-CG")
gender<-unlist(lapply(dft, FUN=function(x) str_split(x,'-')[[1]][1]))
age<-unlist(lapply(dft, FUN=function(x) str_split(x,'-')[[1]][1]))
ed<-unlist(lapply(dft, FUN=function(x) str_split(x,'-')[[1]][3]))
order_f<-order(gender,age,sort(ed,decreasing = T))
my_factor <- factor(levels=c(1:8),
labels=dft[order_f],
ordered=TRUE)
大家可以做个参考table,按列因子水平排列:
library(dplyr)
library(tidyr)
ref <- tibble(key = c("Male-18_34-HS","Female-35_49-HS",
"Male-18_34-CG", "Female-18_34-CG",
"Male-35_49-HS", "Male-35_49-CG",
"Female-18_34-HS", "Female-35_49-CG"))
ref <- separate(ref, key, into = c("gender", "age", "education"), sep = "-", remove = FALSE) %>%
mutate(across("gender", factor, c("Female", "Male")),
across("age", factor, c("18_34", "35_49")),
across("education", factor, c("HS", "CG"))) %>%
arrange(gender, age, education)
然后申请:
factor(d, levels = ref$key)
您可以通过编程方式创建所需的因子水平。
lvls <- do.call(paste, c(tidyr::expand_grid(
c('Female', 'Male'), c('18_34', '35_49'), c('HS', 'CG')), sep = '-'))
lvls
#[1] "Female-18_34-HS" "Female-18_34-CG" "Female-35_49-HS" "Female-35_49-CG"
#[5] "Male-18_34-HS" "Male-18_34-CG" "Male-35_49-HS" "Male-35_49-CG"
您可以将此 lvls
用作 factor
调用中的关卡。
您可以使用 str_split
拆分标签,对生成的列表进行排序,并相应地重建级别:
lvl <- do.call(rbind,stringr::str_split(levels(my_factor),'-'))
lvl <- apply(lvl[order(lvl[,1],lvl[,2],lvl[,3]),],1,paste0,collapse='-')
my_factor <- factor(my_factor,levels = lvl)
levels(my_factor)
#> [1] "Female-18_34-CG" "Female-18_34-HS" "Female-35_49-CG" "Female-35_49-HS"
#> [5] "Male-18_34-CG" "Male-18_34-HS" "Male-35_49-CG" "Male-35_49-HS"
我有一个因素可以识别调查数据集中的阶层。我想对因子重新排序,使某些字符模式出现在其他字符模式之前。
例如,我有这个表示性别、年龄和教育程度的混淆因素:
my_factor <- factor(levels=c(1:8),
labels=c("Male-18_34-HS","Female-35_49-HS",
"Male-18_34-CG", "Female-18_34-CG",
"Male-35_49-HS", "Male-35_49-CG",
"Female-18_34-HS", "Female-35_49-CG"),
ordered=TRUE)
我希望首先对所有女性类别进行排序,然后按正确顺序排列年龄类别,然后按正确顺序排列教育类别。我可以通过 forcats::fct_relevel
:
forcats::fct_relevel(my_factor, sort)
ordered(0)
8 Levels: Female-18_34-CG < Female-18_34-HS < Female-35_49-CG < Female-35_49-HS < Male-18_34-CG < Male-18_34-HS < ... < Male-35_49-HS
但是教育分类顺序错了。有没有办法确保“HS”在“CG”之前,但保持性别和年龄组的顺序不变?
dft<-c("Male-18_34-HS","Female-35_49-HS", "Male-18_34-CG", "Female-18_34-CG", "Male-35_49-HS", "Male-35_49-CG", "Female-18_34-HS", "Female-35_49-CG")
gender<-unlist(lapply(dft, FUN=function(x) str_split(x,'-')[[1]][1]))
age<-unlist(lapply(dft, FUN=function(x) str_split(x,'-')[[1]][1]))
ed<-unlist(lapply(dft, FUN=function(x) str_split(x,'-')[[1]][3]))
order_f<-order(gender,age,sort(ed,decreasing = T))
my_factor <- factor(levels=c(1:8),
labels=dft[order_f],
ordered=TRUE)
大家可以做个参考table,按列因子水平排列:
library(dplyr)
library(tidyr)
ref <- tibble(key = c("Male-18_34-HS","Female-35_49-HS",
"Male-18_34-CG", "Female-18_34-CG",
"Male-35_49-HS", "Male-35_49-CG",
"Female-18_34-HS", "Female-35_49-CG"))
ref <- separate(ref, key, into = c("gender", "age", "education"), sep = "-", remove = FALSE) %>%
mutate(across("gender", factor, c("Female", "Male")),
across("age", factor, c("18_34", "35_49")),
across("education", factor, c("HS", "CG"))) %>%
arrange(gender, age, education)
然后申请:
factor(d, levels = ref$key)
您可以通过编程方式创建所需的因子水平。
lvls <- do.call(paste, c(tidyr::expand_grid(
c('Female', 'Male'), c('18_34', '35_49'), c('HS', 'CG')), sep = '-'))
lvls
#[1] "Female-18_34-HS" "Female-18_34-CG" "Female-35_49-HS" "Female-35_49-CG"
#[5] "Male-18_34-HS" "Male-18_34-CG" "Male-35_49-HS" "Male-35_49-CG"
您可以将此 lvls
用作 factor
调用中的关卡。
您可以使用 str_split
拆分标签,对生成的列表进行排序,并相应地重建级别:
lvl <- do.call(rbind,stringr::str_split(levels(my_factor),'-'))
lvl <- apply(lvl[order(lvl[,1],lvl[,2],lvl[,3]),],1,paste0,collapse='-')
my_factor <- factor(my_factor,levels = lvl)
levels(my_factor)
#> [1] "Female-18_34-CG" "Female-18_34-HS" "Female-35_49-CG" "Female-35_49-HS"
#> [5] "Male-18_34-CG" "Male-18_34-HS" "Male-35_49-CG" "Male-35_49-HS"