将两个具有交替缺失值的字符串列合并为一个

Question

我有一个包含两列 "a" 和 "b" 的数据框，交替缺失值 (NA)

a      b
dog    <NA>
mouse  <NA>
<NA>   cat
bird   <NA>

我想 "merge" / 将它们组合到一个新的 c 列，看起来像这样，即选择每行中的非 NA 元素：

c
dog
mouse
cat
bird

我尝试了 merge 和 join，但都没有达到我想要的效果。也许是因为我没有要合并的 id？对于整数，我会绕过这一点并添加两列，但在我的情况下如何呢？

Answer 1

我为此类任务编写了一个 coalesce() 函数，它的工作方式与 SQL 合并函数非常相似。你会像

那样使用它

dd<-read.table(text="a      b
dog    NA
mouse  NA
NA   cat
bird   NA", header=T)

dd$c <- with(dd, coalesce(a,b))
dd
#       a    b     c
# 1   dog <NA>   dog
# 2 mouse <NA> mouse
# 3  <NA>  cat   cat
# 4  bird <NA>  bird

Answer 2

这是我的尝试（由@MrFlick 修改）

df$c <- apply(df, 1, function(x) na.omit(x)[1])
df
#       a    b     c
# 1   dog <NA>   dog
# 2 mouse <NA> mouse
# 3  <NA>  cat   cat
# 4  bird <NA>  bird

Answer 3

您可以使用简单的 apply :

df$c <- apply(df,1,function(x)  x[!is.na(x)]  ) 

> df
      a    b     c
1   dog <NA>   dog
2 mouse <NA> mouse
3  <NA>  cat   cat
4  bird <NA>  bird

Answer 4

你可以试试pmax

df$c <- pmax(df$a, df$b)
df
#       a    b     c
# 1   dog <NA>   dog
# 2 mouse <NA> mouse
# 3  <NA>  cat   cat
# 4  bird <NA>  bird

...或ifelse:

df$c <- ifelse(is.na(df$a), df$b, df$a)

对于多于两列的情况下更通用的解决方案，您可以找到几种在 R 中实现合并的方法 here。

Answer 5

另一种选择是将 which 与 arr.ind=TRUE

一起使用

indx <- which(!is.na(df), arr.ind=TRUE)
df$c <-  df[indx][order(indx[,1])]
df
#    a    b     c
#1   dog <NA>   dog
#2 mouse <NA> mouse
#3  <NA>  cat   cat
#4  bird <NA>  bird

或者

df$c <- df[cbind(1:nrow(df),max.col(!is.na(df)))]

Answer 6

dpyr 正是您要查找的功能 coalesce()

library(dplyr)

a<-c("dog","mouse",NA,"bird")
b<-c(NA,NA,"cat",NA)

coalesce(a,b)

[1] "dog"   "mouse" "cat"   "bird"

Answer 7

使用 if else 逻辑：

a<-c("dog","mouse",NA,"bird")
b<-c(NA,NA,"cat",NA)

test.df <-data.frame(a,b, stringsAsFactors = FALSE)
test.df$c <- ifelse(is.na(test.df$a), test.df$b, test.df$a)

test.df

      a    b     c
1   dog <NA>   dog
2 mouse <NA> mouse
3  <NA>  cat   cat
4  bird <NA>  bird

将两个具有交替缺失值的字符串列合并为一个

Coalesce two string columns with alternating missing values to one

r

missing-data

na