根据不同的列在列中重新编码观察

Recode observation in column depending on different column

我有一个名为 'survey' 的数据集,其中包含行的个人 ID,以及包含许多问题的列。我需要将 1 列中的值重新编码为 NA 并将观察值移至另一列。

例如:

ID    Food    Vegetable 
aaa   NA       NA 
bbb   NA       lemon
ccc   NA       sprout
ddd   fruit    NA
eee   fruit    NA
fff   NA       watermelon

我想更改属于 ID bbbffflemonwatermelon 观察值,将它们放入 Food 列并重命名他们 fruit(调查受访者将他们放在错误的列中)并在 vegetable 列中留下 NA

看起来像:

   ID    Food        Vegetable 
    aaa   NA         NA 
    bbb   fruit      NA
    ccc   NA         sprout
    ddd   fruit      NA
    eee   fruit      NA
    fff   fruit      NA       

我用过:

survey<- survey %>%
    mutate(food = if_else(str_detect(Vegetable,"(lemon)|(watermelon)"),"fruit", Food))
 

可以将 NA 转换为 food 列中的 fruit,但它与 vegetable 列中的 NA 不一致,它还将 food 列中的所有其他水果变为 NA!

数据:

    structure(list(ID = c("aaa", "bbb", "ccc", "ddd", "eee", "fff"
), Food = c(NA, NA, NA, "fruit", "fruit", NA), Vegetable = c(NA, 
"lemon", "sprout", NA, NA, "watermelon")), class = "data.frame", row.names = c(NA, 
-6L))

P.S.:这是对 的后续回答。这与以前的问题不完全相同,这就是我开始新问题的原因。

dplyr 版本 (1.0.2)

一个选项是根据 Vegetable 值是否 %in% 给定列表更新 FoodVegetablenot_vegetables:

not_vegetables <- c("grape", "tomato")

df %>%
  mutate(Food = if_else(Vegetable %in% not_vegetables, "fruit", Food),
         Vegetable = if_else(Vegetable %in% not_vegetables, NA_character_, Vegetable))

另一种方法是replaceacross两列,并在里面做if_else

df %>%
  mutate(across(
    c(Food, Vegetable), 
    ~replace(., 
             Vegetable %in% not_vegetables, 
             if_else(cur_column() == "Food", 'fruit', NA_character_))
    ))

使用 base R 你能试试这个吗:

#Conditional
values <- c('grape','tomato')
df$Food <- ifelse(df$Vegetable %in% values,'fruit',df$Food)
df$Vegetable <- ifelse(df$Vegetable %in% values,NA,df$Vegetable)

输出:

df
   ID  Food Vegetable
1 aaa fruit      <NA>
2 bbb fruit      <NA>
3 ccc fruit      <NA>
4 ddd fruit      <NA>

数据

df <- structure(list(ID = c("aaa", "bbb", "ccc", "ddd"), Food = c(NA, 
NA, "fruit", "fruit"), Vegetable = c("grape", "tomato", NA, NA
)), class = "data.frame", row.names = c(NA, -4L))