如何在 R 中的数据框中合并多行的特定数据
How to combine specific data across multiple rows in a dataframe in R
我希望通过合并 1 列中的行数据单元格(其中该行中的其他列相同)来更改(连接、重塑我不确定哪个词适合这种情况)我的数据框中的数据。
基本上,我有这样的东西:
>df
>Person_id System_id Category Type Tag
>1A 134 1 Chr Question
>1A 134 1 Chr Answer
>1A 134 1 Chr Evaluation
>1A 134 1 Chr Overall
>1A 134 1 Chr Analysis
>Z4 002 1 Chr Question
>Z4 002 1 Chr Answer
让它看起来像这样:
>Person_id System_id Category Type Tag
>1A 134 1 Chr Question, Answer, Evaluation, Overall, Analysis
>Z4 002 1 Chr Question, Answer
标签不必用逗号分隔,space 即可。
任何在何处寻找此类解决方案的想法都会有所帮助。
谢谢。
我们可以按前四列和 paste
'Tag' 元素分组
library(dplyr)
df %>%
group_by_at(1:4) %>%
summarise(Tag = toString(Tag))
# A tibble: 2 x 5
# Groups: Person_id, System_id, Category [2]
# Person_id System_id Category Type Tag
# <chr> <int> <int> <chr> <chr>
#1 1A 134 1 Chr Question, Answer, Evaluation, Overall, Analysis
#2 Z4 2 1 Chr Question, Answer
或使用base R
aggregate(Tag ~ ., df, toString)
注意:toString
是 paste(., collapse=", ")
的方便包装
数据
df <- structure(list(Person_id = c("1A", "1A", "1A", "1A", "1A", "Z4",
"Z4"), System_id = c(134L, 134L, 134L, 134L, 134L, 2L, 2L), Category = c(1L,
1L, 1L, 1L, 1L, 1L, 1L), Type = c("Chr", "Chr", "Chr", "Chr",
"Chr", "Chr", "Chr"), Tag = c("Question", "Answer", "Evaluation",
"Overall", "Analysis", "Question", "Answer")),
class = "data.frame", row.names = c(NA,
-7L))
您可以使用 paste0
和 collapse = ", "
来实现此目的:
library(dplyr)
df %>%
group_by(Person_id, System_id, Category, Type) %>%
summarise(Tag = paste0(Tag, collapse = ", "))
#Person_id System_id Category Type Tag
# <chr> <int> <int> <chr> <chr>
#1 1A 134 1 Chr Question, Answer, Evaluation, Overall, Analysis
#2 Z4 2 1 Chr Question, Answer
我希望通过合并 1 列中的行数据单元格(其中该行中的其他列相同)来更改(连接、重塑我不确定哪个词适合这种情况)我的数据框中的数据。
基本上,我有这样的东西:
>df
>Person_id System_id Category Type Tag
>1A 134 1 Chr Question
>1A 134 1 Chr Answer
>1A 134 1 Chr Evaluation
>1A 134 1 Chr Overall
>1A 134 1 Chr Analysis
>Z4 002 1 Chr Question
>Z4 002 1 Chr Answer
让它看起来像这样:
>Person_id System_id Category Type Tag
>1A 134 1 Chr Question, Answer, Evaluation, Overall, Analysis
>Z4 002 1 Chr Question, Answer
标签不必用逗号分隔,space 即可。 任何在何处寻找此类解决方案的想法都会有所帮助。
谢谢。
我们可以按前四列和 paste
'Tag' 元素分组
library(dplyr)
df %>%
group_by_at(1:4) %>%
summarise(Tag = toString(Tag))
# A tibble: 2 x 5
# Groups: Person_id, System_id, Category [2]
# Person_id System_id Category Type Tag
# <chr> <int> <int> <chr> <chr>
#1 1A 134 1 Chr Question, Answer, Evaluation, Overall, Analysis
#2 Z4 2 1 Chr Question, Answer
或使用base R
aggregate(Tag ~ ., df, toString)
注意:toString
是 paste(., collapse=", ")
数据
df <- structure(list(Person_id = c("1A", "1A", "1A", "1A", "1A", "Z4",
"Z4"), System_id = c(134L, 134L, 134L, 134L, 134L, 2L, 2L), Category = c(1L,
1L, 1L, 1L, 1L, 1L, 1L), Type = c("Chr", "Chr", "Chr", "Chr",
"Chr", "Chr", "Chr"), Tag = c("Question", "Answer", "Evaluation",
"Overall", "Analysis", "Question", "Answer")),
class = "data.frame", row.names = c(NA,
-7L))
您可以使用 paste0
和 collapse = ", "
来实现此目的:
library(dplyr)
df %>%
group_by(Person_id, System_id, Category, Type) %>%
summarise(Tag = paste0(Tag, collapse = ", "))
#Person_id System_id Category Type Tag
# <chr> <int> <int> <chr> <chr>
#1 1A 134 1 Chr Question, Answer, Evaluation, Overall, Analysis
#2 Z4 2 1 Chr Question, Answer