如何从 tidyverse 中的向量重新编码因子水平?
How to recode factor levels from a vector in tidyverse?
考虑下面的数据集,有一个因子No
有34个水平,我想根据newLvl
重新编码这些水平
MWE
structure(list(No = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L,
11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L,
15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 16L, 17L,
17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 19L, 19L,
19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L, 20L, 20L, 21L, 21L, 21L,
21L, 21L, 21L, 22L, 22L, 22L, 22L, 22L, 22L, 23L, 23L, 23L, 23L,
23L, 23L, 24L, 24L, 24L, 24L, 24L, 24L, 25L, 25L, 25L, 25L, 25L,
25L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 27L, 27L, 27L, 27L,
27L, 27L, 28L, 28L, 28L, 28L, 29L, 29L, 29L, 29L, 29L, 29L, 30L,
30L, 30L, 30L, 30L, 30L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L,
32L, 32L, 32L, 32L, 32L, 32L, 33L, 33L, 33L, 33L, 33L, 33L, 34L,
34L, 34L, 34L, 34L, 34L), .Label = c("1", "2", "3", "4", "5",
"6", "7", "8", "10", "13", "14", "15", "16", "18", "19", "21",
"22", "23", "24", "25", "27", "28", "29", "30", "31", "34", "38",
"39", "40", "42", "47", "48", "49", "53"), class = "factor"),
Gender = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Female",
"Male"), class = "factor"), Age = c(23, 23, 23, 23, 23, 23,
39, 39, 39, 39, 39, 39, 30, 30, 30, 30, 30, 30, 30, 30, 24,
24, 24, 24, 24, 24, 24, 24, 18, 18, 18, 18, 18, 18, 23, 23,
23, 23, 23, 23, 23, 23, 26, 26, 26, 26, 26, 26, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 30, 30, 30, 30,
30, 30, 20, 20, 20, 20, 20, 20, 25, 25, 25, 25, 25, 25, 25,
25, 23, 23, 23, 23, 23, 23, 23, 23, 38, 38, 38, 38, 38, 38,
22, 22, 22, 22, 22, 22, 29, 29, 29, 29, 29, 29, 21, 21, 21,
21, 21, 21, 23, 23, 23, 23, 23, 23, 25, 25, 25, 25, 25, 25,
24, 24, 24, 24, 24, 24, 21, 21, 21, 21, 21, 21, 27, 27, 27,
27, 27, 27, 24, 24, 24, 24, 24, 24, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 21, 21, 21, 21, 27, 27, 27, 27, 27, 27,
34, 34, 34, 34, 34, 34, 26, 26, 26, 26, 26, 26, 26, 26, 28,
28, 28, 28, 28, 28, 39, 39, 39, 39, 39, 39, 26, 26, 26, 26,
26, 26)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-218L))
向量
oldLvl <- c(23, 24, 48, 47, 3, 15, 49, 16, 21, 42, 39, 29, 28, 8, 38, 7, 53, 2, 19, 10, 5, 22, 6, 18, 14, 31, 27, 34, 25, 13, 40, 30, 4, 1)
newLvl <- c(1:34)
试用 1
dplyr::mutate(Nbr = forcats::fct_recode(Nbr, 23 = "1", 24 = "2", 48 = "3", 47 = "4", 3 = "5", 15 = "6", 49 = "7", 16 = "8", 21 = "9", 42 = "10", 39 = "11", 29 = "12", 28 = "13", 8 = "14", 38 = "15", 7 = "16", 2 = "17", 53 = "18", 19 = "19", 10 = "20", 5 = "21", 22 = "22", 6 = "23", 18 = "24", 14 = "25", 31 = "26", 27 = "27", 34 = "28", 25 = "29", 13 = "30", 40 = "31", 30 = "32", 4 = "33", 1 = "34"))
试用 2
df1 <- df %>%
dplyr::mutate(Nbr = No) %>%
dplyr::mutate(Nbr = forcats::fct_recode(Nbr, "23" = "1", "24" = "2", "48" = "3", "47" = "4", "3" = "5", "15" = "6", "49" = "7", "16" = "8", "21" = "9", "42" = "10", "39" = "11", "29" = "12", "28" = "13", "8" = "14", "38" = "15", "7" = "16", "2" = "17", "53" = "18", "19" = "19", "10" = "20", "5" = "21", "22" = "22", "6" = "23", "18" = "24", "14" = "25", "31" = "26", "27" = "27", "34" = "28", "25" = "29", "13" = "30", "40" = "31", "30" = "32", "4" = "33", "1" = "34"))
问题
上面的两次尝试都不行。如何使用 fct_*
系列将旧级别与新级别重新编码为新变量,例如 Nbr
?
创建命名vector
或list
后我们可以使用!!!
library(dplyr)
df1 %>%
mutate(Nbr = forcats::fct_recode(No,
!!! setNames(as.character(oldLvl), newLvl)))
# A tibble: 218 x 4
# No Gender Age Nbr
# <fct> <fct> <dbl> <fct>
# 1 1 Male 23 34
# 2 1 Male 23 34
# 3 1 Male 23 34
# 4 1 Male 23 34
# 5 1 Male 23 34
# 6 1 Male 23 34
# 7 2 Male 39 18
# 8 2 Male 39 18
# 9 2 Male 39 18
#10 2 Male 39 18
# … with 208 more rows
或者,如果我遵守 fct_recode
函数的语法,我可以按照下面的方式完成。关键是 fct_recode (.f, "new" = "old")
而不是我在 post.
中第二次失败的尝试
df1 <- df %>%
dplyr::mutate(Nbr = forcats::fct_recode(No, "1" = "23",
"2" = "24",
"3" = "48",
"4" = "47",
"5" = "3",
"6" = "15",
"7" = "49",
"8" = "16",
"9" = "21",
"10" = "42",
"11" = "39",
"12" = "29",
"13" = "28",
"14" = "8",
"15" = "38",
"16" = "7",
"17" = "2",
"18" = "53",
"19" = "19", # switching is not needed
"20" = "10",
"21" = "5",
"22" = "22", # switching is not needed
"23" = "6",
"24" = "18",
"25" = "14",
"26" = "31",
"27" = "27", # switching is not needed
"28" = "34",
"29" = "25",
"30" = "13",
"31" = "40",
"32" = "30",
"33" = "4",
"34" = "1"))
# A tibble: 218 x 4
# No Gender Age Nbr
# <fct> <fct> <dbl> <fct>
# 1 1 Male 23 34
# 2 1 Male 23 34
# 3 1 Male 23 34
# 4 1 Male 23 34
# 5 1 Male 23 34
# 6 1 Male 23 34
# 7 2 Male 39 17
# 8 2 Male 39 17
# 9 2 Male 39 17
# 10 2 Male 39 17
# … with 208 more rows
有趣的是,在这种情况下没有发出警告。
考虑下面的数据集,有一个因子No
有34个水平,我想根据newLvl
MWE
structure(list(No = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L,
11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L,
15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 16L, 17L,
17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 19L, 19L,
19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L, 20L, 20L, 21L, 21L, 21L,
21L, 21L, 21L, 22L, 22L, 22L, 22L, 22L, 22L, 23L, 23L, 23L, 23L,
23L, 23L, 24L, 24L, 24L, 24L, 24L, 24L, 25L, 25L, 25L, 25L, 25L,
25L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 27L, 27L, 27L, 27L,
27L, 27L, 28L, 28L, 28L, 28L, 29L, 29L, 29L, 29L, 29L, 29L, 30L,
30L, 30L, 30L, 30L, 30L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L,
32L, 32L, 32L, 32L, 32L, 32L, 33L, 33L, 33L, 33L, 33L, 33L, 34L,
34L, 34L, 34L, 34L, 34L), .Label = c("1", "2", "3", "4", "5",
"6", "7", "8", "10", "13", "14", "15", "16", "18", "19", "21",
"22", "23", "24", "25", "27", "28", "29", "30", "31", "34", "38",
"39", "40", "42", "47", "48", "49", "53"), class = "factor"),
Gender = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Female",
"Male"), class = "factor"), Age = c(23, 23, 23, 23, 23, 23,
39, 39, 39, 39, 39, 39, 30, 30, 30, 30, 30, 30, 30, 30, 24,
24, 24, 24, 24, 24, 24, 24, 18, 18, 18, 18, 18, 18, 23, 23,
23, 23, 23, 23, 23, 23, 26, 26, 26, 26, 26, 26, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 30, 30, 30, 30,
30, 30, 20, 20, 20, 20, 20, 20, 25, 25, 25, 25, 25, 25, 25,
25, 23, 23, 23, 23, 23, 23, 23, 23, 38, 38, 38, 38, 38, 38,
22, 22, 22, 22, 22, 22, 29, 29, 29, 29, 29, 29, 21, 21, 21,
21, 21, 21, 23, 23, 23, 23, 23, 23, 25, 25, 25, 25, 25, 25,
24, 24, 24, 24, 24, 24, 21, 21, 21, 21, 21, 21, 27, 27, 27,
27, 27, 27, 24, 24, 24, 24, 24, 24, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 21, 21, 21, 21, 27, 27, 27, 27, 27, 27,
34, 34, 34, 34, 34, 34, 26, 26, 26, 26, 26, 26, 26, 26, 28,
28, 28, 28, 28, 28, 39, 39, 39, 39, 39, 39, 26, 26, 26, 26,
26, 26)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-218L))
向量
oldLvl <- c(23, 24, 48, 47, 3, 15, 49, 16, 21, 42, 39, 29, 28, 8, 38, 7, 53, 2, 19, 10, 5, 22, 6, 18, 14, 31, 27, 34, 25, 13, 40, 30, 4, 1)
newLvl <- c(1:34)
试用 1
dplyr::mutate(Nbr = forcats::fct_recode(Nbr, 23 = "1", 24 = "2", 48 = "3", 47 = "4", 3 = "5", 15 = "6", 49 = "7", 16 = "8", 21 = "9", 42 = "10", 39 = "11", 29 = "12", 28 = "13", 8 = "14", 38 = "15", 7 = "16", 2 = "17", 53 = "18", 19 = "19", 10 = "20", 5 = "21", 22 = "22", 6 = "23", 18 = "24", 14 = "25", 31 = "26", 27 = "27", 34 = "28", 25 = "29", 13 = "30", 40 = "31", 30 = "32", 4 = "33", 1 = "34"))
试用 2
df1 <- df %>%
dplyr::mutate(Nbr = No) %>%
dplyr::mutate(Nbr = forcats::fct_recode(Nbr, "23" = "1", "24" = "2", "48" = "3", "47" = "4", "3" = "5", "15" = "6", "49" = "7", "16" = "8", "21" = "9", "42" = "10", "39" = "11", "29" = "12", "28" = "13", "8" = "14", "38" = "15", "7" = "16", "2" = "17", "53" = "18", "19" = "19", "10" = "20", "5" = "21", "22" = "22", "6" = "23", "18" = "24", "14" = "25", "31" = "26", "27" = "27", "34" = "28", "25" = "29", "13" = "30", "40" = "31", "30" = "32", "4" = "33", "1" = "34"))
问题
上面的两次尝试都不行。如何使用 fct_*
系列将旧级别与新级别重新编码为新变量,例如 Nbr
?
创建命名vector
或list
!!!
library(dplyr)
df1 %>%
mutate(Nbr = forcats::fct_recode(No,
!!! setNames(as.character(oldLvl), newLvl)))
# A tibble: 218 x 4
# No Gender Age Nbr
# <fct> <fct> <dbl> <fct>
# 1 1 Male 23 34
# 2 1 Male 23 34
# 3 1 Male 23 34
# 4 1 Male 23 34
# 5 1 Male 23 34
# 6 1 Male 23 34
# 7 2 Male 39 18
# 8 2 Male 39 18
# 9 2 Male 39 18
#10 2 Male 39 18
# … with 208 more rows
或者,如果我遵守 fct_recode
函数的语法,我可以按照下面的方式完成。关键是 fct_recode (.f, "new" = "old")
而不是我在 post.
df1 <- df %>%
dplyr::mutate(Nbr = forcats::fct_recode(No, "1" = "23",
"2" = "24",
"3" = "48",
"4" = "47",
"5" = "3",
"6" = "15",
"7" = "49",
"8" = "16",
"9" = "21",
"10" = "42",
"11" = "39",
"12" = "29",
"13" = "28",
"14" = "8",
"15" = "38",
"16" = "7",
"17" = "2",
"18" = "53",
"19" = "19", # switching is not needed
"20" = "10",
"21" = "5",
"22" = "22", # switching is not needed
"23" = "6",
"24" = "18",
"25" = "14",
"26" = "31",
"27" = "27", # switching is not needed
"28" = "34",
"29" = "25",
"30" = "13",
"31" = "40",
"32" = "30",
"33" = "4",
"34" = "1"))
# A tibble: 218 x 4
# No Gender Age Nbr
# <fct> <fct> <dbl> <fct>
# 1 1 Male 23 34
# 2 1 Male 23 34
# 3 1 Male 23 34
# 4 1 Male 23 34
# 5 1 Male 23 34
# 6 1 Male 23 34
# 7 2 Male 39 17
# 8 2 Male 39 17
# 9 2 Male 39 17
# 10 2 Male 39 17
# … with 208 more rows
有趣的是,在这种情况下没有发出警告。