将多列转换为因子并赋予它们数值

Question

我有一个名为 some_text_microorganism_growth 的数据框，其中包含大约 100 列。它们是字符，但实际上是有序因子 (NG<SG<LG<MG<HG)，具有等效数值 (0,2.5,6,12,25,40)。我可以逐列转换这些列，但我需要使用：contains("growth") 对所有列进行转换。有什么想法吗？

已编辑数据：

df<-data.frame(ABC_growth=rep(c("MG","LG","NG"), each=5), ZFG_growth=rep(c("GG","LG","SG"),each=5),OtherCol=rep(c("AB*","CD;","other+"),each=5)

#Note 并非所有因素都出现在每一列中，但它们在所有列中都很常见。全套为：(NG

对于我做的一栏：

df$ABC_growth<-factor(dfH$ABC_growth) #convert to factor
df$ABC_growth <-ordered(dfH$ABC_growth,levels= c("SG","LG","MG","HG")) # order
levels(df$ABC_growth) <- c("2.5","12","40","100")

你怎么看？

Answer 1

我们可以使用 mutate 和 across

df <- df %>% 
  mutate(across(contains('growth'), ~ ordered(.,
      levels = c("NG", "SG", "LG", "MG", "HG"), 
       labels = c('0', '2.5', '12', '40', '100'))))

或 lapply 在 base R

nm1 <- grep('growth', names(df), value = TRUE)
df[nm1] <- lapply(df[nm1], function(x)  ordered(x, 
   levels = c("NG", "SG", "LG", "MG", "HG"), 
       labels = c('0', '2.5', '12', '40', '100')))

或者这也可以通过 collapse

中的 ftransform（ftransformv - 对于多列）来完成

library(collapse)
f1 <- function(x)  {
      ordered(x, levels = c("NG", "SG", "LG", "MG", "HG"), 
         labels = c('0', '2.5', '12', '40', '100'))
 }

i1 <- grep('growth', names(df))
ftransformv(df, i1, f1)

-输出

#   ABC_growth ZFG_growth
#1          40       <NA>
#2          40       <NA>
#3          40       <NA>
#4          40       <NA>
#5          40       <NA>
#6          12         12
#7          12         12
#8          12         12
#9          12         12
#10         12         12
#11          0        2.5
#12          0        2.5
#13          0        2.5
#14          0        2.5
#15          0        2.5

Answer 2

这是一个使用 lapply 的 data.table 方法，它为每个元素调用一次 factor 函数。 levels 和 labels 用于设置唯一因子值。


df <- data.frame(ABC_growth=rep(c("MG","LG","NG"), each=5),
                 ZFG_growth=rep(c("GG","LG","SG"),each=5),
                 test = rep(c("GG","LG","SG"),each=5))

library(data.table)

# Coerce data.frame to data.table object

setDT(df)

# Original with all variables including new variable named test

print(df)

#>     ABC_growth ZFG_growth test
#>  1:         MG         GG   GG
#>  2:         MG         GG   GG
#>  3:         MG         GG   GG
#>  4:         MG         GG   GG
#>  5:         MG         GG   GG
#>  6:         LG         LG   LG
#>  7:         LG         LG   LG
#>  8:         LG         LG   LG
#>  9:         LG         LG   LG
#> 10:         LG         LG   LG
#> 11:         NG         SG   SG
#> 12:         NG         SG   SG
#> 13:         NG         SG   SG
#> 14:         NG         SG   SG
#> 15:         NG         SG   SG

# Use grep to extract the variable names that match the provided pattern

cols <- grep('growth', names(df))

df[, lapply(.SD, function(x) factor(x,
  levels = c("NG", "SG", "LG", "MG", "HG"),
  labels = c('0', '2.5', '12', '40', '100')
))][, ..cols] 

#>     ABC_growth ZFG_growth
#>  1:         40       <NA>
#>  2:         40       <NA>
#>  3:         40       <NA>
#>  4:         40       <NA>
#>  5:         40       <NA>
#>  6:         12         12
#>  7:         12         12
#>  8:         12         12
#>  9:         12         12
#> 10:         12         12
#> 11:          0        2.5
#> 12:          0        2.5
#> 13:          0        2.5
#> 14:          0        2.5
#> 15:          0        2.5

^{由 reprex package (v0.3.0)}

于 2021 年 3 月 16 日创建

将多列转换为因子并赋予它们数值

Convert multiple columns to factor and give them numerical values

r

dplyr

tidyr