使用 dplyr 将字符串列重新编码为整数
Recode a string column into integer using dplyr
如何创建一个新的整数列 recode
,它使用 dplyr
方法为数据帧 df
中的现有列 y
重新编码?
# Generates Random data
df <- data.frame(x = sample(1:100, 50),
y = sample(LETTERS, 50, replace = TRUE),
stringsAsFactors = FALSE)
# Structure of the data
str(df)
# 'data.frame': 50 obs. of 2 variables:
# $ x: int 90 4 33 85 30 19 78 77 7 10 ...
# $ y: chr "N" "B" "P" "W" ...
# Making the character vector as factor variable
df$y <- factor(df$y)
# Structure of the data to llok at the effect of factor creation
str(df)
# 'data.frame': 50 obs. of 2 variables:
# $ x: int 90 4 33 85 30 19 78 77 7 10 ...
# $ y: Factor w/ 23 levels "A","B","C","E",..: 12 2 14 21 12 22 7 1 6 17 ...
# collecting the levels of the factor variable
labs <- levels(df$y)
# Recode the levels to sequential integers
recode <- 1:length(labs)
# Creates the recode dataframe
dfrecode <- data.frame(labs, recode)
# Mapping the recodes to the original data
df$recode <- dfrecode[match(df$y, dfrecode$labs), 'recode']
此代码按预期工作。但我想用 dplyr 或其他有效方法替换这种方法。如果我知道所有值,我可以使用 this approach 实现相同的效果。但我想在不查看或明确列出
列中存在的值的情况下执行此操作
这里的技巧是 as.numeric(factor)
实际上 returns 级别是整数。所以,试试这个
df <- data.frame(x = sample(1:100, 50),
y = sample(LETTERS, 50, replace = TRUE),
stringsAsFactors = FALSE)
library(dplyr)
dfrecode <- df %>%
mutate(recode = as.numeric(factor(y)))
str(dfrecode)
如何创建一个新的整数列 recode
,它使用 dplyr
方法为数据帧 df
中的现有列 y
重新编码?
# Generates Random data
df <- data.frame(x = sample(1:100, 50),
y = sample(LETTERS, 50, replace = TRUE),
stringsAsFactors = FALSE)
# Structure of the data
str(df)
# 'data.frame': 50 obs. of 2 variables:
# $ x: int 90 4 33 85 30 19 78 77 7 10 ...
# $ y: chr "N" "B" "P" "W" ...
# Making the character vector as factor variable
df$y <- factor(df$y)
# Structure of the data to llok at the effect of factor creation
str(df)
# 'data.frame': 50 obs. of 2 variables:
# $ x: int 90 4 33 85 30 19 78 77 7 10 ...
# $ y: Factor w/ 23 levels "A","B","C","E",..: 12 2 14 21 12 22 7 1 6 17 ...
# collecting the levels of the factor variable
labs <- levels(df$y)
# Recode the levels to sequential integers
recode <- 1:length(labs)
# Creates the recode dataframe
dfrecode <- data.frame(labs, recode)
# Mapping the recodes to the original data
df$recode <- dfrecode[match(df$y, dfrecode$labs), 'recode']
此代码按预期工作。但我想用 dplyr 或其他有效方法替换这种方法。如果我知道所有值,我可以使用 this approach 实现相同的效果。但我想在不查看或明确列出
列中存在的值的情况下执行此操作这里的技巧是 as.numeric(factor)
实际上 returns 级别是整数。所以,试试这个
df <- data.frame(x = sample(1:100, 50),
y = sample(LETTERS, 50, replace = TRUE),
stringsAsFactors = FALSE)
library(dplyr)
dfrecode <- df %>%
mutate(recode = as.numeric(factor(y)))
str(dfrecode)