将多列转换为因子并赋予它们数值
Convert multiple columns to factor and give them numerical values
我有一个名为 some_text_microorganism_growth
的数据框,其中包含大约 100 列。它们是字符,但实际上是有序因子 (NG<SG<LG<MG<HG
),具有等效数值 (0,2.5,6,12,25,40)
。我可以逐列转换这些列,但我需要使用:contains("growth") 对所有列进行转换。有什么想法吗?
已编辑数据:
df<-data.frame(ABC_growth=rep(c("MG","LG","NG"), each=5), ZFG_growth=rep(c("GG","LG","SG"),each=5),OtherCol=rep(c("AB*","CD;","other+"),each=5)
#Note 并非所有因素都出现在每一列中,但它们在所有列中都很常见。全套为:(NG
对于我做的一栏:
df$ABC_growth<-factor(dfH$ABC_growth) #convert to factor
df$ABC_growth <-ordered(dfH$ABC_growth,levels= c("SG","LG","MG","HG")) # order
levels(df$ABC_growth) <- c("2.5","12","40","100")
你怎么看?
我们可以使用 mutate
和 across
df <- df %>%
mutate(across(contains('growth'), ~ ordered(.,
levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100'))))
或 lapply
在 base R
nm1 <- grep('growth', names(df), value = TRUE)
df[nm1] <- lapply(df[nm1], function(x) ordered(x,
levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100')))
或者这也可以通过 collapse
中的 ftransform
(ftransformv
- 对于多列)来完成
library(collapse)
f1 <- function(x) {
ordered(x, levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100'))
}
i1 <- grep('growth', names(df))
ftransformv(df, i1, f1)
-输出
# ABC_growth ZFG_growth
#1 40 <NA>
#2 40 <NA>
#3 40 <NA>
#4 40 <NA>
#5 40 <NA>
#6 12 12
#7 12 12
#8 12 12
#9 12 12
#10 12 12
#11 0 2.5
#12 0 2.5
#13 0 2.5
#14 0 2.5
#15 0 2.5
这是一个使用 lapply
的 data.table
方法,它为每个元素调用一次 factor
函数。 levels
和 labels
用于设置唯一因子值。
df <- data.frame(ABC_growth=rep(c("MG","LG","NG"), each=5),
ZFG_growth=rep(c("GG","LG","SG"),each=5),
test = rep(c("GG","LG","SG"),each=5))
library(data.table)
# Coerce data.frame to data.table object
setDT(df)
# Original with all variables including new variable named test
print(df)
#> ABC_growth ZFG_growth test
#> 1: MG GG GG
#> 2: MG GG GG
#> 3: MG GG GG
#> 4: MG GG GG
#> 5: MG GG GG
#> 6: LG LG LG
#> 7: LG LG LG
#> 8: LG LG LG
#> 9: LG LG LG
#> 10: LG LG LG
#> 11: NG SG SG
#> 12: NG SG SG
#> 13: NG SG SG
#> 14: NG SG SG
#> 15: NG SG SG
# Use grep to extract the variable names that match the provided pattern
cols <- grep('growth', names(df))
df[, lapply(.SD, function(x) factor(x,
levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100')
))][, ..cols]
#> ABC_growth ZFG_growth
#> 1: 40 <NA>
#> 2: 40 <NA>
#> 3: 40 <NA>
#> 4: 40 <NA>
#> 5: 40 <NA>
#> 6: 12 12
#> 7: 12 12
#> 8: 12 12
#> 9: 12 12
#> 10: 12 12
#> 11: 0 2.5
#> 12: 0 2.5
#> 13: 0 2.5
#> 14: 0 2.5
#> 15: 0 2.5
由 reprex package (v0.3.0)
于 2021 年 3 月 16 日创建
我有一个名为 some_text_microorganism_growth
的数据框,其中包含大约 100 列。它们是字符,但实际上是有序因子 (NG<SG<LG<MG<HG
),具有等效数值 (0,2.5,6,12,25,40)
。我可以逐列转换这些列,但我需要使用:contains("growth") 对所有列进行转换。有什么想法吗?
已编辑数据:
df<-data.frame(ABC_growth=rep(c("MG","LG","NG"), each=5), ZFG_growth=rep(c("GG","LG","SG"),each=5),OtherCol=rep(c("AB*","CD;","other+"),each=5)
#Note 并非所有因素都出现在每一列中,但它们在所有列中都很常见。全套为:(NG 对于我做的一栏: 你怎么看?df$ABC_growth<-factor(dfH$ABC_growth) #convert to factor
df$ABC_growth <-ordered(dfH$ABC_growth,levels= c("SG","LG","MG","HG")) # order
levels(df$ABC_growth) <- c("2.5","12","40","100")
我们可以使用 mutate
和 across
df <- df %>%
mutate(across(contains('growth'), ~ ordered(.,
levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100'))))
或 lapply
在 base R
nm1 <- grep('growth', names(df), value = TRUE)
df[nm1] <- lapply(df[nm1], function(x) ordered(x,
levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100')))
或者这也可以通过 collapse
ftransform
(ftransformv
- 对于多列)来完成
library(collapse)
f1 <- function(x) {
ordered(x, levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100'))
}
i1 <- grep('growth', names(df))
ftransformv(df, i1, f1)
-输出
# ABC_growth ZFG_growth
#1 40 <NA>
#2 40 <NA>
#3 40 <NA>
#4 40 <NA>
#5 40 <NA>
#6 12 12
#7 12 12
#8 12 12
#9 12 12
#10 12 12
#11 0 2.5
#12 0 2.5
#13 0 2.5
#14 0 2.5
#15 0 2.5
这是一个使用 lapply
的 data.table
方法,它为每个元素调用一次 factor
函数。 levels
和 labels
用于设置唯一因子值。
df <- data.frame(ABC_growth=rep(c("MG","LG","NG"), each=5),
ZFG_growth=rep(c("GG","LG","SG"),each=5),
test = rep(c("GG","LG","SG"),each=5))
library(data.table)
# Coerce data.frame to data.table object
setDT(df)
# Original with all variables including new variable named test
print(df)
#> ABC_growth ZFG_growth test
#> 1: MG GG GG
#> 2: MG GG GG
#> 3: MG GG GG
#> 4: MG GG GG
#> 5: MG GG GG
#> 6: LG LG LG
#> 7: LG LG LG
#> 8: LG LG LG
#> 9: LG LG LG
#> 10: LG LG LG
#> 11: NG SG SG
#> 12: NG SG SG
#> 13: NG SG SG
#> 14: NG SG SG
#> 15: NG SG SG
# Use grep to extract the variable names that match the provided pattern
cols <- grep('growth', names(df))
df[, lapply(.SD, function(x) factor(x,
levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100')
))][, ..cols]
#> ABC_growth ZFG_growth
#> 1: 40 <NA>
#> 2: 40 <NA>
#> 3: 40 <NA>
#> 4: 40 <NA>
#> 5: 40 <NA>
#> 6: 12 12
#> 7: 12 12
#> 8: 12 12
#> 9: 12 12
#> 10: 12 12
#> 11: 0 2.5
#> 12: 0 2.5
#> 13: 0 2.5
#> 14: 0 2.5
#> 15: 0 2.5
由 reprex package (v0.3.0)
于 2021 年 3 月 16 日创建