使用 Tidyr 合并多列时处理空格和 NA
Dealing with Spaces and NA's when Uniting Multiple Columns with Tidyr
因此,我想使用下面的简单数据框创建一个新列,其中包含每个人的所有日期,并用分号分隔。
例如,使用 Doug,它应该看起来像 - 星期一;周三;星期五
我想为此使用 Tidyr 的 Unite 函数,但是当我使用它时,我得到 - 星期一;;星期三;;星期五,因为 NA 也可能是空格。有时开头和结尾也有分号。所以我希望有一种方法可以继续使用 "unite" 但用正则表达式进行了增强,这样我就可以在一周中的每一天都用一个分号分隔,并且在开始或结束时没有分号.
我还想坚持使用 Tidyr、Dplyr、Stringr 等
Names<-c("Doug","Ken","Erin","Yuki","John")
Monday<-c("Monday"," "," ","Monday","Monday")
Tuesday<-c(" ","Tuesday","Tuesday"," ","Tuesday")
Wednesday<-c(" ","Wednesday","Wednesday","Wednesday"," ")
Thursday<-c(" "," "," "," ","Thursday")
Friday<-c(" "," "," "," ","Friday")
Days<-data.frame(Monday,Tuesday,Wednesday,Thursday,Friday)
Days<-Days%>%unite(BestDays,Monday,Tuesday,Wednesday,Thursday,Friday,sep="; ",remove=FALSE)
从 getAnywhere("unite_.data.frame")
开始,unite 正在调用 do.call("paste", c(data[from], list(sep = sep)))
underhood,据我所知 paste
不提供省略 NA 的功能,除非以某种方式手动实现;
尽管如此,您可以使用正则表达式方法,如下所示,使用 gsub
来自 base R 来清理结果列:
gsub("^\s;\s|;\s{2}", "", Days$BestDays)
# [1] "Monday" "Tuesday; Wednesday"
# [3] "Tuesday; Wednesday" "Monday; Wednesday"
# [5] "Monday; Tuesday; Thursday; Friday"
这会删除 ^\s;\s
模式或 ;\s{2}
模式,前者处理字符串以 space 字符串开头的情况,我们可以删除 space 和它在 ;\s
之后,否则删除 ;\s{2}
可以处理 \s
既在字符串中间又在字符串末尾的情况。
你可以试试:
Names<-c("Doug","Ken","Erin","Yuki","John")
Monday<-c("Monday",NA,NA,"Monday","Monday")
Tuesday<-c(NA,"Tuesday","Tuesday",NA,"Tuesday")
Wednesday<-c(NA,"Wednesday","Wednesday","Wednesday",NA)
Thursday<-c(NA,NA,NA,NA,"Thursday")
Friday<-c(NA,NA,NA,NA,"Friday")
Days<-data.frame(Monday,Tuesday,Wednesday,Thursday,Friday)
concat_str = function(str) str %>% na.omit %>% paste(collapse = "; ")
Days$BestDaysConcat = apply(Days[,c("Monday","Tuesday","Wednesday","Thursday","Friday")], 1, concat_str)
因此,我想使用下面的简单数据框创建一个新列,其中包含每个人的所有日期,并用分号分隔。
例如,使用 Doug,它应该看起来像 - 星期一;周三;星期五
我想为此使用 Tidyr 的 Unite 函数,但是当我使用它时,我得到 - 星期一;;星期三;;星期五,因为 NA 也可能是空格。有时开头和结尾也有分号。所以我希望有一种方法可以继续使用 "unite" 但用正则表达式进行了增强,这样我就可以在一周中的每一天都用一个分号分隔,并且在开始或结束时没有分号.
我还想坚持使用 Tidyr、Dplyr、Stringr 等
Names<-c("Doug","Ken","Erin","Yuki","John")
Monday<-c("Monday"," "," ","Monday","Monday")
Tuesday<-c(" ","Tuesday","Tuesday"," ","Tuesday")
Wednesday<-c(" ","Wednesday","Wednesday","Wednesday"," ")
Thursday<-c(" "," "," "," ","Thursday")
Friday<-c(" "," "," "," ","Friday")
Days<-data.frame(Monday,Tuesday,Wednesday,Thursday,Friday)
Days<-Days%>%unite(BestDays,Monday,Tuesday,Wednesday,Thursday,Friday,sep="; ",remove=FALSE)
从 getAnywhere("unite_.data.frame")
开始,unite 正在调用 do.call("paste", c(data[from], list(sep = sep)))
underhood,据我所知 paste
不提供省略 NA 的功能,除非以某种方式手动实现;
尽管如此,您可以使用正则表达式方法,如下所示,使用 gsub
来自 base R 来清理结果列:
gsub("^\s;\s|;\s{2}", "", Days$BestDays)
# [1] "Monday" "Tuesday; Wednesday"
# [3] "Tuesday; Wednesday" "Monday; Wednesday"
# [5] "Monday; Tuesday; Thursday; Friday"
这会删除 ^\s;\s
模式或 ;\s{2}
模式,前者处理字符串以 space 字符串开头的情况,我们可以删除 space 和它在 ;\s
之后,否则删除 ;\s{2}
可以处理 \s
既在字符串中间又在字符串末尾的情况。
你可以试试:
Names<-c("Doug","Ken","Erin","Yuki","John")
Monday<-c("Monday",NA,NA,"Monday","Monday")
Tuesday<-c(NA,"Tuesday","Tuesday",NA,"Tuesday")
Wednesday<-c(NA,"Wednesday","Wednesday","Wednesday",NA)
Thursday<-c(NA,NA,NA,NA,"Thursday")
Friday<-c(NA,NA,NA,NA,"Friday")
Days<-data.frame(Monday,Tuesday,Wednesday,Thursday,Friday)
concat_str = function(str) str %>% na.omit %>% paste(collapse = "; ")
Days$BestDaysConcat = apply(Days[,c("Monday","Tuesday","Wednesday","Thursday","Friday")], 1, concat_str)