在 R 中对列值进行唯一分组
Group the column values uniquely in R
我有两列 title
和 text
。我想根据收到的文本数量对标题进行分组。另外,我想对具有相同名称的标题进行唯一分组。
例如
我有
title | text
-------------
A | I like...
B | I wish...
C | review1
C | review2
C | review3
D | Detecting...
C | review1
C | review2
E | New...
我想要的是:
title | text
-------------
A | I like...
B | I wish...
C | review1 review2 review3
D | Detecting...
C | review1 review2
E | New...
我试过的是:
df %>%
filter(title %in% sample(unique(title))) %>%
group_by(title) %>%
select(title, text)
但还是没有达到我的预期。
我没有使用 dplyr,但 base R 可以处理它:
do.line = function(a.title){
return(c(a.title, paste(text[df$title == a.title], collapse = ' ' )))
}
t(sapply(unique(df$title), do.line))
另一种基础 R 方式
tmp=rle(df$title)
df$grp=rep(1:length(tmp$lengths),tmp$lengths)
aggregate(text~title+grp,data=df,FUN=paste0,collapse=" ")
title grp text
1 A 1 I like...
2 B 2 I wish...
3 C 3 review1 review2 review3
4 D 4 Detecting...
5 C 5 review1 review2
6 E 6 New...
您可以编写一个函数来连接向量的唯一值,并在 group_by:
之后使用它
library(dplyr)
df <- data.frame(title = c('A','B','C','C','C','D','C','C','E'),
text = c('I like...', 'I wish...', 'review1','review2','review3',
'Detecting...','review1','review2', 'New...'))
unique_paste <- function(text_vec) {
paste(unique(text_vec), collapse = " ")
}
df2 <- df %>%
mutate(id = cumsum(title != lag(title, default = 'A'))) %>%
group_by(id, title) %>%
do(text = unique_paste(.$text)) %>%
ungroup()
这是一个 dplyr
方法。关键是要正确设置 group_by
,以便它根据行位置 和 title
列中的 值定义组。
library(dplyr)
df %>%
group_by(gp = c(0, na.omit(cumsum(lead(title) != title)))) %>%
summarize(title = unique(title), text = paste0(text, collapse = " ")) %>%
select(-gp)
# A tibble: 6 × 2
title text
<chr> <chr>
1 A I like...
2 B I wish...
3 C review1 review2 review3
4 D Detecting...
5 C review1 review2
6 E New...
我想您可以使用 aggregate
尝试以下基本 R 选项
aggregate(text ~. unique(df), toString)
想到旋转:
library(tidyverse)
# Build test data
df <- data.frame(title=c("A","B","C","C","C","D","C","C","E"),
text=c("I like...","I wish...","review1","review2","review3","Detecting","review1","review2","New..."))
# Combine all values in a list by pivoting
new_df <- df %>% pivot_wider(names_from=title, values_from=text, values_fn=list)
# Bring to desired format by pivoting back
new_df <- new_df %>% pivot_longer(cols=c(names(new_df)), names_to="title", values_to="text")
# Inspecting result
new_df
str(new_df)
# Example query
new_df %>% filter(title=="C") %>% unlist()
我有两列 title
和 text
。我想根据收到的文本数量对标题进行分组。另外,我想对具有相同名称的标题进行唯一分组。
例如
我有
title | text
-------------
A | I like...
B | I wish...
C | review1
C | review2
C | review3
D | Detecting...
C | review1
C | review2
E | New...
我想要的是:
title | text
-------------
A | I like...
B | I wish...
C | review1 review2 review3
D | Detecting...
C | review1 review2
E | New...
我试过的是:
df %>%
filter(title %in% sample(unique(title))) %>%
group_by(title) %>%
select(title, text)
但还是没有达到我的预期。
我没有使用 dplyr,但 base R 可以处理它:
do.line = function(a.title){
return(c(a.title, paste(text[df$title == a.title], collapse = ' ' )))
}
t(sapply(unique(df$title), do.line))
另一种基础 R 方式
tmp=rle(df$title)
df$grp=rep(1:length(tmp$lengths),tmp$lengths)
aggregate(text~title+grp,data=df,FUN=paste0,collapse=" ")
title grp text
1 A 1 I like...
2 B 2 I wish...
3 C 3 review1 review2 review3
4 D 4 Detecting...
5 C 5 review1 review2
6 E 6 New...
您可以编写一个函数来连接向量的唯一值,并在 group_by:
之后使用它library(dplyr)
df <- data.frame(title = c('A','B','C','C','C','D','C','C','E'),
text = c('I like...', 'I wish...', 'review1','review2','review3',
'Detecting...','review1','review2', 'New...'))
unique_paste <- function(text_vec) {
paste(unique(text_vec), collapse = " ")
}
df2 <- df %>%
mutate(id = cumsum(title != lag(title, default = 'A'))) %>%
group_by(id, title) %>%
do(text = unique_paste(.$text)) %>%
ungroup()
这是一个 dplyr
方法。关键是要正确设置 group_by
,以便它根据行位置 和 title
列中的 值定义组。
library(dplyr)
df %>%
group_by(gp = c(0, na.omit(cumsum(lead(title) != title)))) %>%
summarize(title = unique(title), text = paste0(text, collapse = " ")) %>%
select(-gp)
# A tibble: 6 × 2
title text
<chr> <chr>
1 A I like...
2 B I wish...
3 C review1 review2 review3
4 D Detecting...
5 C review1 review2
6 E New...
我想您可以使用 aggregate
aggregate(text ~. unique(df), toString)
想到旋转:
library(tidyverse)
# Build test data
df <- data.frame(title=c("A","B","C","C","C","D","C","C","E"),
text=c("I like...","I wish...","review1","review2","review3","Detecting","review1","review2","New..."))
# Combine all values in a list by pivoting
new_df <- df %>% pivot_wider(names_from=title, values_from=text, values_fn=list)
# Bring to desired format by pivoting back
new_df <- new_df %>% pivot_longer(cols=c(names(new_df)), names_to="title", values_to="text")
# Inspecting result
new_df
str(new_df)
# Example query
new_df %>% filter(title=="C") %>% unlist()