如何根据不同数据框中相同变量的水平创建因子变量
How to create a factor variable based on the levels of the same variable in a different data frame
我有两个数据框:main_df
是mastertable。 addl_df
是一个较小的 table.
目标:将 addl_df
中的所有字符变量转换为 因子,具有 与字符变量相同的水平 main_df
.
中的相同名称
main_df <- data.frame(id=c(1, 2, 3, 4, 5), age=c(10, 20, 30, 40, 45), gender=c("F","F","M","M","F"), city=c("A","B","C","D","D"))
addl_df <- data.frame(id=c(7,8), age=c( 40, 45), gender=c("F","F"), city=c("C","D"))
使用下面的代码,city
将是一个具有 2 个级别(“C”和“D”)的因子变量。我想要的是一个具有 4 个级别“A”、“B”的因子, “C”、“D”和“C”的值为 3(与 main_df
中的定义相同)。
是否可以自动执行此操作(而不是逐个手动定义变量?谢谢!
main_df[sapply(main_df, is.character)] <- lapply(main_df[sapply(main_df, is.character)], as.factor)
addl_df[sapply(addl_df, is.character)] <- lapply(addl_df[sapply(addl_df, is.character)], as.factor)
一个选项是使用 bind_rows
绑定数据集,同时创建数据标识符 ('grp'),将 character
列转换为 factor
,执行 group_split
由 'grp' 转换为 data.frames 的 list
,然后将 list
的名称设置为 setNames
并使用 [= 更新原始对象19=]
library(dplyr)
bind_rows(main_df, addl_df, .id = 'grp') %>%
mutate(across(where(is.character), factor)) %>%
group_split(grp, .keep = FALSE) %>%
setNames(c('main_df', 'addl_df')) %>%
list2env(.GlobalEnv)
-输出
> str(main_df)
tibble [5 × 4] (S3: tbl_df/tbl/data.frame)
$ id : num [1:5] 1 2 3 4 5
$ age : num [1:5] 10 20 30 40 45
$ gender: Factor w/ 2 levels "F","M": 1 1 2 2 1
$ city : Factor w/ 4 levels "A","B","C","D": 1 2 3 4 4
> str(addl_df)
tibble [2 × 4] (S3: tbl_df/tbl/data.frame)
$ id : num [1:2] 7 8
$ age : num [1:2] 40 45
$ gender: Factor w/ 2 levels "F","M": 1 1
$ city : Factor w/ 4 levels "A","B","C","D": 3 4
我有两个数据框:main_df
是mastertable。 addl_df
是一个较小的 table.
目标:将 addl_df
中的所有字符变量转换为 因子,具有 与字符变量相同的水平 main_df
.
main_df <- data.frame(id=c(1, 2, 3, 4, 5), age=c(10, 20, 30, 40, 45), gender=c("F","F","M","M","F"), city=c("A","B","C","D","D"))
addl_df <- data.frame(id=c(7,8), age=c( 40, 45), gender=c("F","F"), city=c("C","D"))
使用下面的代码,city
将是一个具有 2 个级别(“C”和“D”)的因子变量。我想要的是一个具有 4 个级别“A”、“B”的因子, “C”、“D”和“C”的值为 3(与 main_df
中的定义相同)。
是否可以自动执行此操作(而不是逐个手动定义变量?谢谢!
main_df[sapply(main_df, is.character)] <- lapply(main_df[sapply(main_df, is.character)], as.factor)
addl_df[sapply(addl_df, is.character)] <- lapply(addl_df[sapply(addl_df, is.character)], as.factor)
一个选项是使用 bind_rows
绑定数据集,同时创建数据标识符 ('grp'),将 character
列转换为 factor
,执行 group_split
由 'grp' 转换为 data.frames 的 list
,然后将 list
的名称设置为 setNames
并使用 [= 更新原始对象19=]
library(dplyr)
bind_rows(main_df, addl_df, .id = 'grp') %>%
mutate(across(where(is.character), factor)) %>%
group_split(grp, .keep = FALSE) %>%
setNames(c('main_df', 'addl_df')) %>%
list2env(.GlobalEnv)
-输出
> str(main_df)
tibble [5 × 4] (S3: tbl_df/tbl/data.frame)
$ id : num [1:5] 1 2 3 4 5
$ age : num [1:5] 10 20 30 40 45
$ gender: Factor w/ 2 levels "F","M": 1 1 2 2 1
$ city : Factor w/ 4 levels "A","B","C","D": 1 2 3 4 4
> str(addl_df)
tibble [2 × 4] (S3: tbl_df/tbl/data.frame)
$ id : num [1:2] 7 8
$ age : num [1:2] 40 45
$ gender: Factor w/ 2 levels "F","M": 1 1
$ city : Factor w/ 4 levels "A","B","C","D": 3 4