Trim 多个变量同时的因子标签
Trim factor labels for multiple variables at the same time
这是我的玩具数据集:
library(tidyverse)
dat <- tibble (x1 = c("False - very long label specific to x1", "False - very long label specific to x1", "True - very long label specific to x1", "True - very long label specific to x1"),
x2 = c("False - very long label specific to x2", "False - very long label specific to x2", "False - very long label specific to x2", "True - very long label specific to x2"),
y = c(10, 5, 12, 4)) %>% mutate_at(vars(x1:x2), factor)
head(dat)
#> # A tibble: 4 x 3
#> x1 x2 y
#> <fct> <fct> <dbl>
#> 1 False - very long label specific~ False - very long label specific~ 10
#> 2 False - very long label specific~ False - very long label specific~ 5
#> 3 True - very long label specific ~ False - very long label specific~ 12
#> 4 True - very long label specific ~ True - very long label specific ~ 4
我想trim很长的因子标签,它们都有两个共同点:
- 全部以 True 或 False
开头
- 包括列名(即每列的因子标签是唯一的)
我想简化这个,并且每个因子列只有 True 和 False 之类的东西。这是我想要的输出:
#> # A tibble: 4 x 3
#> x1 x2 y
#> <fct> <fct> <dbl>
#> 1 False False 10
#> 2 False False 5
#> 3 True False 12
#> 4 True True 4
我认为它应该与 mutate_at
和 fct_relabel
以及 str_trunc
之类的东西一起使用,但我无法弄清楚。
我们可以使用 trimws
和 whitespace
library(dplyr)
dat %>%
mutate_if(is.factor, ~ factor(trimws(., whitespace = "\s*-.*")))
# A tibble: 4 x 3
# x1 x2 y
# <fct> <fct> <dbl>
#1 False False 10
#2 False False 5
#3 True False 12
#4 True True 4
或 fct_relabel
和 str_remove
library(forcats)
library(stringr)
dat %>%
mutate_if(is.factor, ~ fct_relabel(., ~str_remove(., '\s*-.*')))
或使用data.table
library(data.table)
m1 <- names(which(sapply(dat, is.factor)))
setDT(dat)[, (nm1) := lapply(.SD, function(x)
factor(sub('\s*-.*', "", x))) , .SDcols = nm1]
这是我的玩具数据集:
library(tidyverse)
dat <- tibble (x1 = c("False - very long label specific to x1", "False - very long label specific to x1", "True - very long label specific to x1", "True - very long label specific to x1"),
x2 = c("False - very long label specific to x2", "False - very long label specific to x2", "False - very long label specific to x2", "True - very long label specific to x2"),
y = c(10, 5, 12, 4)) %>% mutate_at(vars(x1:x2), factor)
head(dat)
#> # A tibble: 4 x 3
#> x1 x2 y
#> <fct> <fct> <dbl>
#> 1 False - very long label specific~ False - very long label specific~ 10
#> 2 False - very long label specific~ False - very long label specific~ 5
#> 3 True - very long label specific ~ False - very long label specific~ 12
#> 4 True - very long label specific ~ True - very long label specific ~ 4
我想trim很长的因子标签,它们都有两个共同点:
- 全部以 True 或 False 开头
- 包括列名(即每列的因子标签是唯一的)
我想简化这个,并且每个因子列只有 True 和 False 之类的东西。这是我想要的输出:
#> # A tibble: 4 x 3
#> x1 x2 y
#> <fct> <fct> <dbl>
#> 1 False False 10
#> 2 False False 5
#> 3 True False 12
#> 4 True True 4
我认为它应该与 mutate_at
和 fct_relabel
以及 str_trunc
之类的东西一起使用,但我无法弄清楚。
我们可以使用 trimws
和 whitespace
library(dplyr)
dat %>%
mutate_if(is.factor, ~ factor(trimws(., whitespace = "\s*-.*")))
# A tibble: 4 x 3
# x1 x2 y
# <fct> <fct> <dbl>
#1 False False 10
#2 False False 5
#3 True False 12
#4 True True 4
或 fct_relabel
和 str_remove
library(forcats)
library(stringr)
dat %>%
mutate_if(is.factor, ~ fct_relabel(., ~str_remove(., '\s*-.*')))
或使用data.table
library(data.table)
m1 <- names(which(sapply(dat, is.factor)))
setDT(dat)[, (nm1) := lapply(.SD, function(x)
factor(sub('\s*-.*', "", x))) , .SDcols = nm1]