如何粘贴数据框行中的文本,只保留 R 中的唯一值
How can I paste text from dataframe rows, keeping only unique values in R
我有一个数据框,其中每一行代表一个人,列代表他们的名字。一些值是 NA 或重复的。数据看起来像下面的数据框。
Name | Name1 | Name2 | Name3 | Name4 |
---|---|---|---|---|
Tom | Tom | Thomas | Tom | Tommy |
Jim | NA | James | NA | Jimmy |
Dave | Dave | David | NA | Davey |
Tim | NA | Timothy | Tim | Timmy |
Rob | Rob | NA | Rob | Robby |
Sam | NA | NA | Sam | NA |
我想合并每一行中的唯一名称并将它们放入一个新列中,其中每个名称只出现一次。我知道我可以使用粘贴功能生成一列,其中所有文本值都显示如下:
Name | Name1 | Name2 | Name3 | Name4 | unique |
---|---|---|---|---|---|
Tom | Tom | Thomas | NA | Tommy | Tom, Tom, Thomas, NA, Tommy |
但我不希望相同的文本在唯一列中出现多次。 如何合并行数据,使每个名称在新的 $unique 单元格中只出现一次?
Name | Name1 | Name2 | Name3 | Name4 | unique |
---|---|---|---|---|---|
Tom | Tom | Thomas | Tom | Tommy | Tom, Thomas, Tommy |
Jim | NA | James | NA | Jimmy | Jim, James, Jimmy |
Dave | Dave | David | NA | Davey | Dave, David, Davey |
Tim | NA | Timothy | Tim | Timmy | Tim, Timothy, Timmy |
Rob | Rob | NA | Rob | Robby | Rob, Robert, Robby |
Sam | NA | NA | Sam | NA | Sam |
apply
ing unique
row-wise, na.omit
并折叠 toString
.
transform(dat, unique=apply(dat, 1, \(x) toString(na.omit(unique(x)))))
# Name Name1 Name2 Name3 Name4 unique
# 1 Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
# 2 Jim <NA> James <NA> Jimmy Jim, James, Jimmy
# 3 Dave Dave David <NA> Davey Dave, David, Davey
# 4 Tim <NA> Timothy Tim Timmy Tim, Timothy, Timmy
# 5 Rob Rob <NA> Rob Robby Rob, Robby
# 6 Sam <NA> <NA> Sam <NA> Sam
如果愿意,您还可以实施 sort
。
数据:
dat <- structure(list(Name = c("Tom", "Jim", "Dave", "Tim", "Rob", "Sam"
), Name1 = c("Tom", NA, "Dave", NA, "Rob", NA), Name2 = c("Thomas",
"James", "David", "Timothy", NA, NA), Name3 = c("Tom", NA, NA,
"Tim", "Rob", "Sam"), Name4 = c("Tommy", "Jimmy", "Davey", "Timmy",
"Robby", NA)), class = "data.frame", row.names = c(NA, -6L))
df%>%
rowid_to_column()%>%
left_join(pivot_longer(.,-rowid)%>%
group_by(rowid)%>%
summarise(value=toString(na.omit(unique(value))), .groups = 'drop'))
rowid Name Name1 Name2 Name3 Name4 value
1 1 Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
2 2 Jim <NA> James <NA> Jimmy Jim, James, Jimmy
3 3 Dave Dave David <NA> Davey Dave, David, Davey
4 4 Tim <NA> Timothy Tim Timmy Tim, Timothy, Timmy
5 5 Rob Rob <NA> Rob Robby Rob, Robby
6 6 Sam <NA> <NA> Sam <NA> Sam
使用tidyverse
library(dplyr)
df1 %>%
rowwise %>%
mutate(unique = toString(unique(na.omit(c_across(everything()))))) %>%
ungroup
-输出
# A tibble: 6 × 6
Name Name1 Name2 Name3 Name4 unique
<chr> <chr> <chr> <chr> <chr> <chr>
1 Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
2 Jim <NA> James <NA> Jimmy Jim, James, Jimmy
3 Dave Dave David <NA> Davey Dave, David, Davey
4 Tim <NA> Timothy Tim Timmy Tim, Timothy, Timmy
5 Rob Rob <NA> Rob Robby Rob, Robby
6 Sam <NA> <NA> Sam <NA> Sam