如何根据不同数据框中相同变量的水平创建因子变量

How to create a factor variable based on the levels of the same variable in a different data frame

我有两个数据框:main_df是mastertable。 addl_df 是一个较小的 table.

目标:将 addl_df 中的所有字符变量转换为 因子,具有 与字符变量相同的水平 main_df.

中的相同名称
main_df <- data.frame(id=c(1, 2, 3, 4, 5), age=c(10, 20, 30, 40, 45), gender=c("F","F","M","M","F"), city=c("A","B","C","D","D"))
addl_df <- data.frame(id=c(7,8), age=c( 40, 45), gender=c("F","F"), city=c("C","D"))

使用下面的代码,city 将是一个具有 2 个级别(“C”和“D”)的因子变量。我想要的是一个具有 4 个级别“A”、“B”的因子, “C”、“D”和“C”的值为 3(与 main_df 中的定义相同)。

是否可以自动执行此操作(而不是逐个手动定义变量?谢谢!

main_df[sapply(main_df, is.character)] <- lapply(main_df[sapply(main_df, is.character)], as.factor) 
addl_df[sapply(addl_df, is.character)] <- lapply(addl_df[sapply(addl_df, is.character)], as.factor)

一个选项是使用 bind_rows 绑定数据集,同时创建数据标识符 ('grp'),将 character 列转换为 factor,执行 group_split 由 'grp' 转换为 data.frames 的 list,然后将 list 的名称设置为 setNames 并使用 [= 更新原始对象19=]

library(dplyr)
bind_rows(main_df, addl_df, .id = 'grp') %>% 
    mutate(across(where(is.character), factor)) %>%
    group_split(grp, .keep = FALSE) %>%
    setNames(c('main_df', 'addl_df')) %>%
    list2env(.GlobalEnv)

-输出

> str(main_df)
tibble [5 × 4] (S3: tbl_df/tbl/data.frame)
 $ id    : num [1:5] 1 2 3 4 5
 $ age   : num [1:5] 10 20 30 40 45
 $ gender: Factor w/ 2 levels "F","M": 1 1 2 2 1
 $ city  : Factor w/ 4 levels "A","B","C","D": 1 2 3 4 4
> str(addl_df)
tibble [2 × 4] (S3: tbl_df/tbl/data.frame)
 $ id    : num [1:2] 7 8
 $ age   : num [1:2] 40 45
 $ gender: Factor w/ 2 levels "F","M": 1 1
 $ city  : Factor w/ 4 levels "A","B","C","D": 3 4