如何用特定字符串字符替换列中的特定 NA

Question

这可能很简单，但我想不通

df<-structure(list(Besti = c("Friend", "myfriend", "yourbest", "allbest"
), Friend = c("Friend", NA, "Friend", "Toofriend"), Val1 = c(0L, 
0L, 0L, 0L), Val2 = c(0L, 0L, 0L, 0L), Val3 = c(0L, 1L, 0L, 0L
), Val4 = c(0L, 0L, 0L, 0L), Val5 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-4L))

我的数据是这样的，我想知道如果一高一低的字符串相同，如何将NA替换成字符串

所以我可以发现有一个NA

sum(is.na(df$Friend))

如果是高一级朋友，低一级朋友，我想换成朋友

所以输出看起来像这样

df_out<-structure(list(Besti = c("Friend", "myfriend", "yourbest", "allbest"
), Friend = c("Friend", "Friend", "Friend", "Toofriend"), Val1 = c(0L, 
0L, 0L, 0L), Val2 = c(0L, 0L, 0L, 0L), Val3 = c(0L, 1L, 0L, 0L
), Val4 = c(0L, 0L, 0L, 0L), Val5 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-4L))

想象一下我有 100 个 HA 或很多并且没有顺序，可能前一个是 NA 或后一个是 NA 但后两个是 Friend 或其他字符串

如果我想将 NA 替换为 Friend，我可以这样做

df$Friend <- df$Friend %>% replace_na('Friend')

Answer 1

library(dplyr)
df |>
  mutate(
    upper = lag(Friend),
    lower = lead(Friend),
    replacement = ifelse(upper == lower, upper, NA),
    Friend = coalesce(Friend, replacement)
  )
#>      Besti    Friend Val1 Val2 Val3 Val4 Val5  upper     lower replacement
#> 1   Friend    Friend    0    0    0    0    0   <NA>      <NA>        <NA>
#> 2 myfriend    Friend    0    0    1    0    0 Friend    Friend      Friend
#> 3 yourbest    Friend    0    0    0    0    0   <NA> Toofriend        <NA>
#> 4  allbest Toofriend    0    0    0    0    0 Friend      <NA>        <NA>

dplyr::lag() 和 dplyr::lead() 移动向量 Friend down/up。然后我们可以测试它们是否具有相同的值，如果它们具有相同的值，我们将使用它作为重置价值的价值。 dplyr::coalesce() 替换中的 NA Friend 与 replacement 值在同一位置。这可以简化为：

df |>
  mutate(
    replacement = ifelse(lag(Friend) == tail(Friend), lag(Friend), NA),
    Friend = coalesce(Friend, replacement)
  )
#>      Besti    Friend Val1 Val2 Val3 Val4 Val5 replacement
#> 1   Friend    Friend    0    0    0    0    0          NA
#> 2 myfriend      <NA>    0    0    1    0    0          NA
#> 3 yourbest    Friend    0    0    0    0    0          NA
#> 4  allbest Toofriend    0    0    0    0    0          NA

Answer 2

这是另一种方法。在数据框中，我添加了每次观察前后的 Friend 值：

library(dplyr)

df$after <- lead(df$Friend)
df$before <- lag(df$Friend)

df

输出:

     Besti    Friend Val1 Val2 Val3 Val4 Val5     after before
1   Friend    Friend    0    0    0    0    0      <NA>   <NA>
2 myfriend      <NA>    0    0    1    0    0    Friend Friend
3 yourbest    Friend    0    0    0    0    0 Toofriend   <NA>
4  allbest Toofriend    0    0    0    0    0      <NA> Friend

现在我们可以使用 ifelse():

推导出 Friend 变量的新版本

df$Friend <- ifelse(
  is.na(df$Friend) & 
  df$after == "Friend" & 
  df$before == "Friend", "Friend", df$Friend
)

df[, -c(8,9)]

输出:

     Besti    Friend Val1 Val2 Val3 Val4 Val5
1   Friend    Friend    0    0    0    0    0
2 myfriend    Friend    0    0    1    0    0
3 yourbest    Friend    0    0    0    0    0
4  allbest Toofriend    0    0    0    0    0

如何用特定字符串字符替换列中的特定 NA

how to replace specific NA in a column with certain string character

r

missing-data

dplyr