如何用特定字符串字符替换列中的特定 NA

how to replace specific NA in a column with certain string character

这可能很简单,但我想不通

df<-structure(list(Besti = c("Friend", "myfriend", "yourbest", "allbest"
), Friend = c("Friend", NA, "Friend", "Toofriend"), Val1 = c(0L, 
0L, 0L, 0L), Val2 = c(0L, 0L, 0L, 0L), Val3 = c(0L, 1L, 0L, 0L
), Val4 = c(0L, 0L, 0L, 0L), Val5 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-4L))

我的数据是这样的,我想知道如果一高一低的字符串相同,如何将NA替换成字符串

所以我可以发现有一个NA

sum(is.na(df$Friend))

如果是高一级朋友,低一级朋友,我想换成朋友

所以输出看起来像这样

df_out<-structure(list(Besti = c("Friend", "myfriend", "yourbest", "allbest"
), Friend = c("Friend", "Friend", "Friend", "Toofriend"), Val1 = c(0L, 
0L, 0L, 0L), Val2 = c(0L, 0L, 0L, 0L), Val3 = c(0L, 1L, 0L, 0L
), Val4 = c(0L, 0L, 0L, 0L), Val5 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-4L))

想象一下我有 100 个 HA 或很多并且没有顺序,可能前一个是 NA 或后一个是 NA 但后两个是 Friend 或其他字符串

如果我想将 NA 替换为 Friend,我可以这样做

df$Friend <- df$Friend %>% replace_na('Friend')
library(dplyr)
df |>
  mutate(
    upper = lag(Friend),
    lower = lead(Friend),
    replacement = ifelse(upper == lower, upper, NA),
    Friend = coalesce(Friend, replacement)
  )
#>      Besti    Friend Val1 Val2 Val3 Val4 Val5  upper     lower replacement
#> 1   Friend    Friend    0    0    0    0    0   <NA>      <NA>        <NA>
#> 2 myfriend    Friend    0    0    1    0    0 Friend    Friend      Friend
#> 3 yourbest    Friend    0    0    0    0    0   <NA> Toofriend        <NA>
#> 4  allbest Toofriend    0    0    0    0    0 Friend      <NA>        <NA>

dplyr::lag()dplyr::lead() 移动向量 Friend down/up。 然后我们可以测试它们是否具有相同的值,如果它们具有相同的值,我们将使用它 作为重置价值的价值。 dplyr::coalesce() 替换中的 NA Friendreplacement 值在同一位置。 这可以简化为:

df |>
  mutate(
    replacement = ifelse(lag(Friend) == tail(Friend), lag(Friend), NA),
    Friend = coalesce(Friend, replacement)
  )
#>      Besti    Friend Val1 Val2 Val3 Val4 Val5 replacement
#> 1   Friend    Friend    0    0    0    0    0          NA
#> 2 myfriend      <NA>    0    0    1    0    0          NA
#> 3 yourbest    Friend    0    0    0    0    0          NA
#> 4  allbest Toofriend    0    0    0    0    0          NA

这是另一种方法。在数据框中,我添加了每次观察前后的 Friend 值:

library(dplyr)

df$after <- lead(df$Friend)
df$before <- lag(df$Friend)

df

输出:

     Besti    Friend Val1 Val2 Val3 Val4 Val5     after before
1   Friend    Friend    0    0    0    0    0      <NA>   <NA>
2 myfriend      <NA>    0    0    1    0    0    Friend Friend
3 yourbest    Friend    0    0    0    0    0 Toofriend   <NA>
4  allbest Toofriend    0    0    0    0    0      <NA> Friend

现在我们可以使用 ifelse():

推导出 Friend 变量的新版本
df$Friend <- ifelse(
  is.na(df$Friend) & 
  df$after == "Friend" & 
  df$before == "Friend", "Friend", df$Friend
)

df[, -c(8,9)]

输出:

     Besti    Friend Val1 Val2 Val3 Val4 Val5
1   Friend    Friend    0    0    0    0    0
2 myfriend    Friend    0    0    1    0    0
3 yourbest    Friend    0    0    0    0    0
4  allbest Toofriend    0    0    0    0    0