使用 tidyverse 将具有千位分隔符值的多个因子列转换为整数
Converting several factor columns with thousand-separator values to integers using tidyverse
无法在 SO 上找到确切的解决方案,无论如何想要使用 tidyverse R 包集的最简洁的版本。希望除第一列以外的所有列都是整数,并在现实生活中满足更多列的需求
df <- structure(list(col_1 = structure(1:3, .Label = c("a", "b", "c"
), class = "factor"), col_2 = structure(c(1L, 3L, 2L), .Label = c("1,234",
"23", "4,567"), class = "factor"), col_3 = structure(1:3, .Label = c("1234",
"46", "6,789"), class = "factor")), .Names = c("col_1", "col_2",
"col_3"), row.names = c(NA, -3L), class = "data.frame")
TIA
在列中查找 ","
,如果存在则将该列设为数字:
df1 = lapply(df, function(x) {if(any(grepl(",", x))){x<-as.numeric(gsub(",", "", x))};x})
# as.data.frame(df1)
# col_1 col_2 col_3
#1 a 1234 1234
#2 b 4567 46
#3 c 23 6789
可以使用mutate_at
,排除第一列,使用gsub
去掉逗号再转为整数:
library(tidyverse)
df %>% mutate_at(.cols = -1, funs(as.integer(gsub(",", "", .))))
# col_1 col_2 col_3
#1 a 1234 1234
#2 b 4567 46
#3 c 23 6789
parse_number
的另一个选项,但它给出了数字列:
df %>% mutate_at(.cols = -1, funs(parse_number))
# col_1 col_2 col_3
#1 a 1234 1234
#2 b 4567 46
#3 c 23 6789
这是 data.table
的版本。将 'data.frame' 转换为 'data.table' (setDT(df)
),在 .SDcols
中指定感兴趣的列,用 lapply
遍历它们,替换 ,
gsub
中有空格,转换为 integer
并将其分配 (:=
) 回列
library(data.table)
setDT(df)[, (2:3) := lapply(.SD, function(x)
as.integer(gsub(",", "", x))), .SDcols = 2:3]
df
# col_1 col_2 col_3
#1: a 1234 1234
#2: b 4567 46
#3: c 23 6789
无法在 SO 上找到确切的解决方案,无论如何想要使用 tidyverse R 包集的最简洁的版本。希望除第一列以外的所有列都是整数,并在现实生活中满足更多列的需求
df <- structure(list(col_1 = structure(1:3, .Label = c("a", "b", "c"
), class = "factor"), col_2 = structure(c(1L, 3L, 2L), .Label = c("1,234",
"23", "4,567"), class = "factor"), col_3 = structure(1:3, .Label = c("1234",
"46", "6,789"), class = "factor")), .Names = c("col_1", "col_2",
"col_3"), row.names = c(NA, -3L), class = "data.frame")
TIA
在列中查找 ","
,如果存在则将该列设为数字:
df1 = lapply(df, function(x) {if(any(grepl(",", x))){x<-as.numeric(gsub(",", "", x))};x})
# as.data.frame(df1)
# col_1 col_2 col_3
#1 a 1234 1234
#2 b 4567 46
#3 c 23 6789
可以使用mutate_at
,排除第一列,使用gsub
去掉逗号再转为整数:
library(tidyverse)
df %>% mutate_at(.cols = -1, funs(as.integer(gsub(",", "", .))))
# col_1 col_2 col_3
#1 a 1234 1234
#2 b 4567 46
#3 c 23 6789
parse_number
的另一个选项,但它给出了数字列:
df %>% mutate_at(.cols = -1, funs(parse_number))
# col_1 col_2 col_3
#1 a 1234 1234
#2 b 4567 46
#3 c 23 6789
这是 data.table
的版本。将 'data.frame' 转换为 'data.table' (setDT(df)
),在 .SDcols
中指定感兴趣的列,用 lapply
遍历它们,替换 ,
gsub
中有空格,转换为 integer
并将其分配 (:=
) 回列
library(data.table)
setDT(df)[, (2:3) := lapply(.SD, function(x)
as.integer(gsub(",", "", x))), .SDcols = 2:3]
df
# col_1 col_2 col_3
#1: a 1234 1234
#2: b 4567 46
#3: c 23 6789