如何粘贴数据框行中的文本,只保留 R 中的唯一值
How can I paste text from dataframe rows, keeping only unique values in R
我有一个数据框,其中每一行代表一个人,列代表他们的名字。一些值是 NA 或重复的。数据看起来像下面的数据框。
Name
Name1
Name2
Name3
Name4
Tom
Tom
Thomas
Tom
Tommy
Jim
NA
James
NA
Jimmy
Dave
Dave
David
NA
Davey
Tim
NA
Timothy
Tim
Timmy
Rob
Rob
NA
Rob
Robby
Sam
NA
NA
Sam
NA
我想合并每一行中的唯一名称并将它们放入一个新列中,其中每个名称只出现一次。我知道我可以使用粘贴功能生成一列,其中所有文本值都显示如下:
Name
Name1
Name2
Name3
Name4
unique
Tom
Tom
Thomas
NA
Tommy
Tom, Tom, Thomas, NA, Tommy
但我不希望相同的文本在唯一列中出现多次。
如何合并行数据,使每个名称在新的 $unique 单元格中只出现一次?
Name
Name1
Name2
Name3
Name4
unique
Tom
Tom
Thomas
Tom
Tommy
Tom, Thomas, Tommy
Jim
NA
James
NA
Jimmy
Jim, James, Jimmy
Dave
Dave
David
NA
Davey
Dave, David, Davey
Tim
NA
Timothy
Tim
Timmy
Tim, Timothy, Timmy
Rob
Rob
NA
Rob
Robby
Rob, Robert, Robby
Sam
NA
NA
Sam
NA
Sam
apply
ing unique
row-wise, na.omit
并折叠 toString
.
transform(dat, unique=apply(dat, 1, \(x) toString(na.omit(unique(x)))))
# Name Name1 Name2 Name3 Name4 unique
# 1 Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
# 2 Jim <NA> James <NA> Jimmy Jim, James, Jimmy
# 3 Dave Dave David <NA> Davey Dave, David, Davey
# 4 Tim <NA> Timothy Tim Timmy Tim, Timothy, Timmy
# 5 Rob Rob <NA> Rob Robby Rob, Robby
# 6 Sam <NA> <NA> Sam <NA> Sam
如果愿意,您还可以实施 sort
。
数据:
dat <- structure(list(Name = c("Tom", "Jim", "Dave", "Tim", "Rob", "Sam"
), Name1 = c("Tom", NA, "Dave", NA, "Rob", NA), Name2 = c("Thomas",
"James", "David", "Timothy", NA, NA), Name3 = c("Tom", NA, NA,
"Tim", "Rob", "Sam"), Name4 = c("Tommy", "Jimmy", "Davey", "Timmy",
"Robby", NA)), class = "data.frame", row.names = c(NA, -6L))
df%>%
rowid_to_column()%>%
left_join(pivot_longer(.,-rowid)%>%
group_by(rowid)%>%
summarise(value=toString(na.omit(unique(value))), .groups = 'drop'))
rowid Name Name1 Name2 Name3 Name4 value
1 1 Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
2 2 Jim <NA> James <NA> Jimmy Jim, James, Jimmy
3 3 Dave Dave David <NA> Davey Dave, David, Davey
4 4 Tim <NA> Timothy Tim Timmy Tim, Timothy, Timmy
5 5 Rob Rob <NA> Rob Robby Rob, Robby
6 6 Sam <NA> <NA> Sam <NA> Sam
使用tidyverse
library(dplyr)
df1 %>%
rowwise %>%
mutate(unique = toString(unique(na.omit(c_across(everything()))))) %>%
ungroup
-输出
# A tibble: 6 × 6
Name Name1 Name2 Name3 Name4 unique
<chr> <chr> <chr> <chr> <chr> <chr>
1 Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
2 Jim <NA> James <NA> Jimmy Jim, James, Jimmy
3 Dave Dave David <NA> Davey Dave, David, Davey
4 Tim <NA> Timothy Tim Timmy Tim, Timothy, Timmy
5 Rob Rob <NA> Rob Robby Rob, Robby
6 Sam <NA> <NA> Sam <NA> Sam
我有一个数据框,其中每一行代表一个人,列代表他们的名字。一些值是 NA 或重复的。数据看起来像下面的数据框。
Name | Name1 | Name2 | Name3 | Name4 |
---|---|---|---|---|
Tom | Tom | Thomas | Tom | Tommy |
Jim | NA | James | NA | Jimmy |
Dave | Dave | David | NA | Davey |
Tim | NA | Timothy | Tim | Timmy |
Rob | Rob | NA | Rob | Robby |
Sam | NA | NA | Sam | NA |
我想合并每一行中的唯一名称并将它们放入一个新列中,其中每个名称只出现一次。我知道我可以使用粘贴功能生成一列,其中所有文本值都显示如下:
Name | Name1 | Name2 | Name3 | Name4 | unique |
---|---|---|---|---|---|
Tom | Tom | Thomas | NA | Tommy | Tom, Tom, Thomas, NA, Tommy |
但我不希望相同的文本在唯一列中出现多次。 如何合并行数据,使每个名称在新的 $unique 单元格中只出现一次?
Name | Name1 | Name2 | Name3 | Name4 | unique |
---|---|---|---|---|---|
Tom | Tom | Thomas | Tom | Tommy | Tom, Thomas, Tommy |
Jim | NA | James | NA | Jimmy | Jim, James, Jimmy |
Dave | Dave | David | NA | Davey | Dave, David, Davey |
Tim | NA | Timothy | Tim | Timmy | Tim, Timothy, Timmy |
Rob | Rob | NA | Rob | Robby | Rob, Robert, Robby |
Sam | NA | NA | Sam | NA | Sam |
apply
ing unique
row-wise, na.omit
并折叠 toString
.
transform(dat, unique=apply(dat, 1, \(x) toString(na.omit(unique(x)))))
# Name Name1 Name2 Name3 Name4 unique
# 1 Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
# 2 Jim <NA> James <NA> Jimmy Jim, James, Jimmy
# 3 Dave Dave David <NA> Davey Dave, David, Davey
# 4 Tim <NA> Timothy Tim Timmy Tim, Timothy, Timmy
# 5 Rob Rob <NA> Rob Robby Rob, Robby
# 6 Sam <NA> <NA> Sam <NA> Sam
如果愿意,您还可以实施 sort
。
数据:
dat <- structure(list(Name = c("Tom", "Jim", "Dave", "Tim", "Rob", "Sam"
), Name1 = c("Tom", NA, "Dave", NA, "Rob", NA), Name2 = c("Thomas",
"James", "David", "Timothy", NA, NA), Name3 = c("Tom", NA, NA,
"Tim", "Rob", "Sam"), Name4 = c("Tommy", "Jimmy", "Davey", "Timmy",
"Robby", NA)), class = "data.frame", row.names = c(NA, -6L))
df%>%
rowid_to_column()%>%
left_join(pivot_longer(.,-rowid)%>%
group_by(rowid)%>%
summarise(value=toString(na.omit(unique(value))), .groups = 'drop'))
rowid Name Name1 Name2 Name3 Name4 value
1 1 Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
2 2 Jim <NA> James <NA> Jimmy Jim, James, Jimmy
3 3 Dave Dave David <NA> Davey Dave, David, Davey
4 4 Tim <NA> Timothy Tim Timmy Tim, Timothy, Timmy
5 5 Rob Rob <NA> Rob Robby Rob, Robby
6 6 Sam <NA> <NA> Sam <NA> Sam
使用tidyverse
library(dplyr)
df1 %>%
rowwise %>%
mutate(unique = toString(unique(na.omit(c_across(everything()))))) %>%
ungroup
-输出
# A tibble: 6 × 6
Name Name1 Name2 Name3 Name4 unique
<chr> <chr> <chr> <chr> <chr> <chr>
1 Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
2 Jim <NA> James <NA> Jimmy Jim, James, Jimmy
3 Dave Dave David <NA> Davey Dave, David, Davey
4 Tim <NA> Timothy Tim Timmy Tim, Timothy, Timmy
5 Rob Rob <NA> Rob Robby Rob, Robby
6 Sam <NA> <NA> Sam <NA> Sam