如何用特定字符串字符替换列中的特定 NA
how to replace specific NA in a column with certain string character
这可能很简单,但我想不通
df<-structure(list(Besti = c("Friend", "myfriend", "yourbest", "allbest"
), Friend = c("Friend", NA, "Friend", "Toofriend"), Val1 = c(0L,
0L, 0L, 0L), Val2 = c(0L, 0L, 0L, 0L), Val3 = c(0L, 1L, 0L, 0L
), Val4 = c(0L, 0L, 0L, 0L), Val5 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-4L))
我的数据是这样的,我想知道如果一高一低的字符串相同,如何将NA替换成字符串
所以我可以发现有一个NA
sum(is.na(df$Friend))
如果是高一级朋友,低一级朋友,我想换成朋友
所以输出看起来像这样
df_out<-structure(list(Besti = c("Friend", "myfriend", "yourbest", "allbest"
), Friend = c("Friend", "Friend", "Friend", "Toofriend"), Val1 = c(0L,
0L, 0L, 0L), Val2 = c(0L, 0L, 0L, 0L), Val3 = c(0L, 1L, 0L, 0L
), Val4 = c(0L, 0L, 0L, 0L), Val5 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-4L))
想象一下我有 100 个 HA 或很多并且没有顺序,可能前一个是 NA 或后一个是 NA 但后两个是 Friend 或其他字符串
如果我想将 NA 替换为 Friend,我可以这样做
df$Friend <- df$Friend %>% replace_na('Friend')
library(dplyr)
df |>
mutate(
upper = lag(Friend),
lower = lead(Friend),
replacement = ifelse(upper == lower, upper, NA),
Friend = coalesce(Friend, replacement)
)
#> Besti Friend Val1 Val2 Val3 Val4 Val5 upper lower replacement
#> 1 Friend Friend 0 0 0 0 0 <NA> <NA> <NA>
#> 2 myfriend Friend 0 0 1 0 0 Friend Friend Friend
#> 3 yourbest Friend 0 0 0 0 0 <NA> Toofriend <NA>
#> 4 allbest Toofriend 0 0 0 0 0 Friend <NA> <NA>
dplyr::lag()
和 dplyr::lead()
移动向量 Friend
down/up。
然后我们可以测试它们是否具有相同的值,如果它们具有相同的值,我们将使用它
作为重置价值的价值。 dplyr::coalesce()
替换中的 NA
Friend
与 replacement
值在同一位置。
这可以简化为:
df |>
mutate(
replacement = ifelse(lag(Friend) == tail(Friend), lag(Friend), NA),
Friend = coalesce(Friend, replacement)
)
#> Besti Friend Val1 Val2 Val3 Val4 Val5 replacement
#> 1 Friend Friend 0 0 0 0 0 NA
#> 2 myfriend <NA> 0 0 1 0 0 NA
#> 3 yourbest Friend 0 0 0 0 0 NA
#> 4 allbest Toofriend 0 0 0 0 0 NA
这是另一种方法。在数据框中,我添加了每次观察前后的 Friend
值:
library(dplyr)
df$after <- lead(df$Friend)
df$before <- lag(df$Friend)
df
输出:
Besti Friend Val1 Val2 Val3 Val4 Val5 after before
1 Friend Friend 0 0 0 0 0 <NA> <NA>
2 myfriend <NA> 0 0 1 0 0 Friend Friend
3 yourbest Friend 0 0 0 0 0 Toofriend <NA>
4 allbest Toofriend 0 0 0 0 0 <NA> Friend
现在我们可以使用 ifelse()
:
推导出 Friend
变量的新版本
df$Friend <- ifelse(
is.na(df$Friend) &
df$after == "Friend" &
df$before == "Friend", "Friend", df$Friend
)
df[, -c(8,9)]
输出:
Besti Friend Val1 Val2 Val3 Val4 Val5
1 Friend Friend 0 0 0 0 0
2 myfriend Friend 0 0 1 0 0
3 yourbest Friend 0 0 0 0 0
4 allbest Toofriend 0 0 0 0 0
这可能很简单,但我想不通
df<-structure(list(Besti = c("Friend", "myfriend", "yourbest", "allbest"
), Friend = c("Friend", NA, "Friend", "Toofriend"), Val1 = c(0L,
0L, 0L, 0L), Val2 = c(0L, 0L, 0L, 0L), Val3 = c(0L, 1L, 0L, 0L
), Val4 = c(0L, 0L, 0L, 0L), Val5 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-4L))
我的数据是这样的,我想知道如果一高一低的字符串相同,如何将NA替换成字符串
所以我可以发现有一个NA
sum(is.na(df$Friend))
如果是高一级朋友,低一级朋友,我想换成朋友
所以输出看起来像这样
df_out<-structure(list(Besti = c("Friend", "myfriend", "yourbest", "allbest"
), Friend = c("Friend", "Friend", "Friend", "Toofriend"), Val1 = c(0L,
0L, 0L, 0L), Val2 = c(0L, 0L, 0L, 0L), Val3 = c(0L, 1L, 0L, 0L
), Val4 = c(0L, 0L, 0L, 0L), Val5 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-4L))
想象一下我有 100 个 HA 或很多并且没有顺序,可能前一个是 NA 或后一个是 NA 但后两个是 Friend 或其他字符串
如果我想将 NA 替换为 Friend,我可以这样做
df$Friend <- df$Friend %>% replace_na('Friend')
library(dplyr)
df |>
mutate(
upper = lag(Friend),
lower = lead(Friend),
replacement = ifelse(upper == lower, upper, NA),
Friend = coalesce(Friend, replacement)
)
#> Besti Friend Val1 Val2 Val3 Val4 Val5 upper lower replacement
#> 1 Friend Friend 0 0 0 0 0 <NA> <NA> <NA>
#> 2 myfriend Friend 0 0 1 0 0 Friend Friend Friend
#> 3 yourbest Friend 0 0 0 0 0 <NA> Toofriend <NA>
#> 4 allbest Toofriend 0 0 0 0 0 Friend <NA> <NA>
dplyr::lag()
和 dplyr::lead()
移动向量 Friend
down/up。
然后我们可以测试它们是否具有相同的值,如果它们具有相同的值,我们将使用它
作为重置价值的价值。 dplyr::coalesce()
替换中的 NA
Friend
与 replacement
值在同一位置。
这可以简化为:
df |>
mutate(
replacement = ifelse(lag(Friend) == tail(Friend), lag(Friend), NA),
Friend = coalesce(Friend, replacement)
)
#> Besti Friend Val1 Val2 Val3 Val4 Val5 replacement
#> 1 Friend Friend 0 0 0 0 0 NA
#> 2 myfriend <NA> 0 0 1 0 0 NA
#> 3 yourbest Friend 0 0 0 0 0 NA
#> 4 allbest Toofriend 0 0 0 0 0 NA
这是另一种方法。在数据框中,我添加了每次观察前后的 Friend
值:
library(dplyr)
df$after <- lead(df$Friend)
df$before <- lag(df$Friend)
df
输出:
Besti Friend Val1 Val2 Val3 Val4 Val5 after before
1 Friend Friend 0 0 0 0 0 <NA> <NA>
2 myfriend <NA> 0 0 1 0 0 Friend Friend
3 yourbest Friend 0 0 0 0 0 Toofriend <NA>
4 allbest Toofriend 0 0 0 0 0 <NA> Friend
现在我们可以使用 ifelse()
:
Friend
变量的新版本
df$Friend <- ifelse(
is.na(df$Friend) &
df$after == "Friend" &
df$before == "Friend", "Friend", df$Friend
)
df[, -c(8,9)]
输出:
Besti Friend Val1 Val2 Val3 Val4 Val5
1 Friend Friend 0 0 0 0 0
2 myfriend Friend 0 0 1 0 0
3 yourbest Friend 0 0 0 0 0
4 allbest Toofriend 0 0 0 0 0