以其他列条目为条件在 data.frame 中重新编码一列的更有效方法

Question

我正在寻找一种更有效的方法来重新编码数据框中的列条目，其中重新编码以其他列中的条目为条件。

就拿这个简单的例子来说，它演示了我目前为重新编码的数据创建一个新列，将其转换为字符，然后使用子集方括号重新编码数据的过程（这个过程有正式名称吗？）。

## example data frame
df = data.frame( id = seq( 1 , 100 , by=1 ) ,
                 x = rep( c("W", "Z") , each=50),
                 y = c( rep( c("A","B","C","D") , 25 ) ) )

# add a new column based on column y; convert to character 
df$newY = as.character( df$y ) 

# change newY entries to numbers based on conditions in other columns
df$newY[ df$x == "W" & df$newY == "B" ] <- 1
df$newY[ df$x == "Z" & df$newY == "D" ] <- 3

此过程适用于重新编码具有少量条件的变量，但对于大量的条件参数或有许多不同的变量要重新编码时会变得很麻烦。

谁能帮我找到更有效的方法？

谢谢！

Answer 1

一些方法：

df <- data.frame(id = seq( 1 , 100 , by=1 ) ,
                 x = rep( c("W", "Z") , each=50),
                 y = c( rep( c("A","B","C","D") , 25)))

# Take the product (my preference)
as.numeric(df$x) * as.numeric(df$y)

# Create new factor based on x and y and convert to numeric
as.numeric(as.factor(paste0(df$x, df$y)))

以其他列条目为条件在 data.frame 中重新编码一列的更有效方法

More efficient method of recoding one column in a data.frame conditional on other column entries

r

recode