如何粘贴数据框行中的文本,只保留 R 中的唯一值

How can I paste text from dataframe rows, keeping only unique values in R

我有一个数据框,其中每一行代表一个人,列代表他们的名字。一些值是 NA 或重复的。数据看起来像下面的数据框。

Name Name1 Name2 Name3 Name4
Tom Tom Thomas Tom Tommy
Jim NA James NA Jimmy
Dave Dave David NA Davey
Tim NA Timothy Tim Timmy
Rob Rob NA Rob Robby
Sam NA NA Sam NA

我想合并每一行中的唯一名称并将它们放入一个新列中,其中每个名称只出现一次。我知道我可以使用粘贴功能生成一列,其中所有文本值都显示如下:

Name Name1 Name2 Name3 Name4 unique
Tom Tom Thomas NA Tommy Tom, Tom, Thomas, NA, Tommy

但我不希望相同的文本在唯一列中出现多次。 如何合并行数据,使每个名称在新的 $unique 单元格中只出现一次?

Name Name1 Name2 Name3 Name4 unique
Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
Jim NA James NA Jimmy Jim, James, Jimmy
Dave Dave David NA Davey Dave, David, Davey
Tim NA Timothy Tim Timmy Tim, Timothy, Timmy
Rob Rob NA Rob Robby Rob, Robert, Robby
Sam NA NA Sam NA Sam

applying unique row-wise, na.omit 并折叠 toString.

transform(dat, unique=apply(dat, 1, \(x) toString(na.omit(unique(x)))))
#   Name Name1   Name2 Name3 Name4              unique
# 1  Tom   Tom  Thomas   Tom Tommy  Tom, Thomas, Tommy
# 2  Jim  <NA>   James  <NA> Jimmy   Jim, James, Jimmy
# 3 Dave  Dave   David  <NA> Davey  Dave, David, Davey
# 4  Tim  <NA> Timothy   Tim Timmy Tim, Timothy, Timmy
# 5  Rob   Rob    <NA>   Rob Robby          Rob, Robby
# 6  Sam  <NA>    <NA>   Sam  <NA>                 Sam

如果愿意,您还可以实施 sort


数据:

dat <- structure(list(Name = c("Tom", "Jim", "Dave", "Tim", "Rob", "Sam"
), Name1 = c("Tom", NA, "Dave", NA, "Rob", NA), Name2 = c("Thomas", 
"James", "David", "Timothy", NA, NA), Name3 = c("Tom", NA, NA, 
"Tim", "Rob", "Sam"), Name4 = c("Tommy", "Jimmy", "Davey", "Timmy", 
"Robby", NA)), class = "data.frame", row.names = c(NA, -6L))
df%>%
 rowid_to_column()%>%
 left_join(pivot_longer(.,-rowid)%>%
 group_by(rowid)%>%    
 summarise(value=toString(na.omit(unique(value))), .groups = 'drop'))
  rowid Name Name1   Name2 Name3 Name4               value
1     1  Tom   Tom  Thomas   Tom Tommy  Tom, Thomas, Tommy
2     2  Jim  <NA>   James  <NA> Jimmy   Jim, James, Jimmy
3     3 Dave  Dave   David  <NA> Davey  Dave, David, Davey
4     4  Tim  <NA> Timothy   Tim Timmy Tim, Timothy, Timmy
5     5  Rob   Rob    <NA>   Rob Robby          Rob, Robby
6     6  Sam  <NA>    <NA>   Sam  <NA>                 Sam

使用tidyverse

library(dplyr)
df1 %>% 
 rowwise %>% 
  mutate(unique = toString(unique(na.omit(c_across(everything()))))) %>% 
  ungroup

-输出

# A tibble: 6 × 6
  Name  Name1 Name2   Name3 Name4 unique             
  <chr> <chr> <chr>   <chr> <chr> <chr>              
1 Tom   Tom   Thomas  Tom   Tommy Tom, Thomas, Tommy 
2 Jim   <NA>  James   <NA>  Jimmy Jim, James, Jimmy  
3 Dave  Dave  David   <NA>  Davey Dave, David, Davey 
4 Tim   <NA>  Timothy Tim   Timmy Tim, Timothy, Timmy
5 Rob   Rob   <NA>    Rob   Robby Rob, Robby         
6 Sam   <NA>  <NA>    Sam   <NA>  Sam