使用 Tidyr 合并多列时处理空格和 NA

Dealing with Spaces and NA's when Uniting Multiple Columns with Tidyr

因此,我想使用下面的简单数据框创建一个新列,其中包含每个人的所有日期,并用分号分隔。

例如,使用 Doug,它应该看起来像 - 星期一;周三;星期五

我想为此使用 Tidyr 的 Unite 函数,但是当我使用它时,我得到 - 星期一;;星期三;;星期五,因为 NA 也可能是空格。有时开头和结尾也有分号。所以我希望有一种方法可以继续使用 "unite" 但用正则表达式进行了增强,这样我就可以在一周中的每一天都用一个分号分隔,并且在开始或结束时没有分号.

我还想坚持使用 Tidyr、Dplyr、Stringr 等

Names<-c("Doug","Ken","Erin","Yuki","John")
Monday<-c("Monday"," "," ","Monday","Monday")
Tuesday<-c(" ","Tuesday","Tuesday"," ","Tuesday")
Wednesday<-c(" ","Wednesday","Wednesday","Wednesday"," ")
Thursday<-c(" "," "," "," ","Thursday")
Friday<-c(" "," "," "," ","Friday")

Days<-data.frame(Monday,Tuesday,Wednesday,Thursday,Friday)

 Days<-Days%>%unite(BestDays,Monday,Tuesday,Wednesday,Thursday,Friday,sep="; ",remove=FALSE)

getAnywhere("unite_.data.frame") 开始,unite 正在调用 do.call("paste", c(data[from], list(sep = sep))) underhood,据我所知 paste 不提供省略 NA 的功能,除非以某种方式手动实现;

尽管如此,您可以使用正则表达式方法,如下所示,使用 gsub 来自 base R 来清理结果列:

gsub("^\s;\s|;\s{2}", "", Days$BestDays)
# [1] "Monday"                            "Tuesday; Wednesday"               
# [3] "Tuesday; Wednesday"                "Monday; Wednesday"                
# [5] "Monday; Tuesday; Thursday; Friday"

这会删除 ^\s;\s 模式或 ;\s{2} 模式,前者处理字符串以 space 字符串开头的情况,我们可以删除 space 和它在 ;\s 之后,否则删除 ;\s{2} 可以处理 \s 既在字符串中间又在字符串末尾的情况。

你可以试试:

Names<-c("Doug","Ken","Erin","Yuki","John")
Monday<-c("Monday",NA,NA,"Monday","Monday")
Tuesday<-c(NA,"Tuesday","Tuesday",NA,"Tuesday")
Wednesday<-c(NA,"Wednesday","Wednesday","Wednesday",NA)
Thursday<-c(NA,NA,NA,NA,"Thursday")
Friday<-c(NA,NA,NA,NA,"Friday")

Days<-data.frame(Monday,Tuesday,Wednesday,Thursday,Friday)

concat_str = function(str) str %>% na.omit %>% paste(collapse = "; ")
Days$BestDaysConcat = apply(Days[,c("Monday","Tuesday","Wednesday","Thursday","Friday")], 1, concat_str)