在 dplyr 管道内传播数据;按组处理多个条目
Spreading data within dplyr pipe; dealing with multiple entries by group
我正在尝试使用 tidyr
和 dplyr
管道中的 spread
函数将长数据帧转换为宽数据帧。我的数据采用以下格式:-
df<-structure(list(Hour = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
dt = c("05/06/2021", "07/06/2021", "08/06/2021", "10/06/2021",
"12/06/2021", "12/06/2021", "14/06/2021", "14/06/2021", "15/06/2021",
"09/06/2021", "10/06/2021", "10/06/2021", "11/06/2021", "13/06/2021",
"14/06/2021", "05/06/2021", "06/06/2021", "07/06/2021", "08/06/2021",
"08/06/2021", "09/06/2021", "09/06/2021", "10/06/2021", "11/06/2021",
"12/06/2021"), Programme_watched = c("Wrestling", "Drama",
"Drama", "Football", "Rugby", "Movie", "Movie", "News", "Music",
"Tennis", "Racing", "Racing", "Documentary ", "Movie", "Movie",
"Athletics", "News", "Children", "Music", "Movie", "Football",
"News", "Football", "Rugby", "Drama")), class = "data.frame", row.names = c(NA,
-25L))
df
Hour dt Programme_watched
1 0 05/06/2021 Wrestling
2 0 07/06/2021 Drama
3 0 08/06/2021 Drama
4 0 10/06/2021 Football
5 0 12/06/2021 Rugby
6 0 12/06/2021 Movie
7 0 14/06/2021 Movie
8 0 14/06/2021 News
9 0 15/06/2021 Music
10 1 09/06/2021 Tennis
11 1 10/06/2021 Racing
12 1 10/06/2021 Racing
13 1 11/06/2021 Documentary
14 1 13/06/2021 Movie
15 1 14/06/2021 Movie
16 2 05/06/2021 Athletics
17 2 06/06/2021 News
18 2 07/06/2021 Children
19 2 08/06/2021 Music
20 2 08/06/2021 Movie
21 2 09/06/2021 Football
22 2 09/06/2021 News
23 2 10/06/2021 Football
24 2 11/06/2021 Rugby
25 2 12/06/2021 Drama
我希望能够扩大范围,将日期变成以 Programme_watched
变量作为值的列标题。然而,这就是我得到的:-
df%>%
group_by(Hour,dt)%>%
summarise(Programmes=Programme_watched)%>%
spread(dt, Programmes)
`summarise()` has grouped output by 'Hour', 'dt'. You can override using the `.groups` argument.
Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 10 rows:
* 19, 20
* 21, 22
* 11, 12
* 5, 6
* 7, 8
是否因为某些分组变量有多个值,即 Hour
和 dt
?如果是这样,我希望将这两个值一起粘贴到一个单元格中,然后由 dt
变量传播。有人可以帮忙吗?
您正在寻找这样的解决方案吗?
spread
已被取代:
spread()
的开发已经完成,对于新代码,我们建议切换到 pivot_wider()
,它更易于使用,功能更强大,并且仍在积极开发中。 df %>% spread(key, value)
等同于 df %>% pivot_wider(names_from = key, values_from = value)
df%>%
group_by(Hour,dt)%>%
summarise(Programmes=Programme_watched) %>%
ungroup() %>%
pivot_wider(
names_from= dt,
values_from = Programmes
) %>%
unnest()
# A tibble: 6 x 12
Hour `05/06/2021` `07/06/2021` `08/06/2021` `10/06/2021` `12/06/2021` `14/06/2021` `15/06/2021`
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 0 Wrestling Drama Drama Football Rugby Movie Music
2 0 Wrestling Drama Drama Football Movie News Music
3 1 NA NA NA Racing NA Movie NA
4 1 NA NA NA Racing NA Movie NA
5 2 Athletics Children Music Football Drama NA NA
6 2 Athletics Children Movie Football Drama NA NA
# ... with 4 more variables: 09/06/2021 <chr>, 11/06/2021 <chr>, 13/06/2021 <chr>, 06/06/2021 <chr>
或
library(tidyr)
library(dplyr)
df %>%
pivot_wider(
names_from= dt,
values_from = Programme_watched
) %>%
unnest() %>%
distinct()
Hour `05/06/2021` `07/06/2021` `08/06/2021` `10/06/2021` `12/06/2021` `14/06/2021` `15/06/2021` `09/06/2021` `11/06/2021` `13/06/2021` `06/06/2021`
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 0 Wrestling Drama Drama Football Rugby Movie Music NA NA NA NA
2 0 Wrestling Drama Drama Football Movie News Music NA NA NA NA
3 1 NA NA NA Racing NA Movie NA Tennis "Documentary " Movie NA
4 2 Athletics Children Music Football Drama NA NA Football "Rugby" NA News
5 2 Athletics Children Movie Football Drama NA NA News "Rugby" NA News
一个选项也是在 pivot_wider
之前按组创建序列 (rowid
) 以处理重复元素
library(data.table)
library(tidyr)
library(dplyr)
df%>%
group_by(Hour,dt)%>%
summarise(Programmes=Programme_watched, .groups = 'drop') %>%
mutate(rn = rowid(Hour, dt)) %>%
pivot_wider(names_from = dt, values_from = Programmes) %>%
select(-rn)
-输出
# A tibble: 6 × 12
Hour `05/06/2021` `07/06/2021` `08/06/2021` `10/06/2021` `12/06/2021` `14/06/2021` `15/06/2021` `09/06/2021` `11/06/2021` `13/06/2021` `06/06/2021`
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 0 Wrestling Drama Drama Football Rugby Movie Music <NA> <NA> <NA> <NA>
2 0 <NA> <NA> <NA> <NA> Movie News <NA> <NA> <NA> <NA> <NA>
3 1 <NA> <NA> <NA> Racing <NA> Movie <NA> Tennis "Documentary " Movie <NA>
4 1 <NA> <NA> <NA> Racing <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 2 Athletics Children Music Football Drama <NA> <NA> Football "Rugby" <NA> News
6 2 <NA> <NA> Movie <NA> <NA> <NA> <NA> News <NA> <NA> <NA>
我正在尝试使用 tidyr
和 dplyr
管道中的 spread
函数将长数据帧转换为宽数据帧。我的数据采用以下格式:-
df<-structure(list(Hour = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
dt = c("05/06/2021", "07/06/2021", "08/06/2021", "10/06/2021",
"12/06/2021", "12/06/2021", "14/06/2021", "14/06/2021", "15/06/2021",
"09/06/2021", "10/06/2021", "10/06/2021", "11/06/2021", "13/06/2021",
"14/06/2021", "05/06/2021", "06/06/2021", "07/06/2021", "08/06/2021",
"08/06/2021", "09/06/2021", "09/06/2021", "10/06/2021", "11/06/2021",
"12/06/2021"), Programme_watched = c("Wrestling", "Drama",
"Drama", "Football", "Rugby", "Movie", "Movie", "News", "Music",
"Tennis", "Racing", "Racing", "Documentary ", "Movie", "Movie",
"Athletics", "News", "Children", "Music", "Movie", "Football",
"News", "Football", "Rugby", "Drama")), class = "data.frame", row.names = c(NA,
-25L))
df
Hour dt Programme_watched
1 0 05/06/2021 Wrestling
2 0 07/06/2021 Drama
3 0 08/06/2021 Drama
4 0 10/06/2021 Football
5 0 12/06/2021 Rugby
6 0 12/06/2021 Movie
7 0 14/06/2021 Movie
8 0 14/06/2021 News
9 0 15/06/2021 Music
10 1 09/06/2021 Tennis
11 1 10/06/2021 Racing
12 1 10/06/2021 Racing
13 1 11/06/2021 Documentary
14 1 13/06/2021 Movie
15 1 14/06/2021 Movie
16 2 05/06/2021 Athletics
17 2 06/06/2021 News
18 2 07/06/2021 Children
19 2 08/06/2021 Music
20 2 08/06/2021 Movie
21 2 09/06/2021 Football
22 2 09/06/2021 News
23 2 10/06/2021 Football
24 2 11/06/2021 Rugby
25 2 12/06/2021 Drama
我希望能够扩大范围,将日期变成以 Programme_watched
变量作为值的列标题。然而,这就是我得到的:-
df%>%
group_by(Hour,dt)%>%
summarise(Programmes=Programme_watched)%>%
spread(dt, Programmes)
`summarise()` has grouped output by 'Hour', 'dt'. You can override using the `.groups` argument.
Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 10 rows:
* 19, 20
* 21, 22
* 11, 12
* 5, 6
* 7, 8
是否因为某些分组变量有多个值,即 Hour
和 dt
?如果是这样,我希望将这两个值一起粘贴到一个单元格中,然后由 dt
变量传播。有人可以帮忙吗?
您正在寻找这样的解决方案吗?
spread
已被取代:
spread()
的开发已经完成,对于新代码,我们建议切换到 pivot_wider()
,它更易于使用,功能更强大,并且仍在积极开发中。 df %>% spread(key, value)
等同于 df %>% pivot_wider(names_from = key, values_from = value)
df%>%
group_by(Hour,dt)%>%
summarise(Programmes=Programme_watched) %>%
ungroup() %>%
pivot_wider(
names_from= dt,
values_from = Programmes
) %>%
unnest()
# A tibble: 6 x 12
Hour `05/06/2021` `07/06/2021` `08/06/2021` `10/06/2021` `12/06/2021` `14/06/2021` `15/06/2021`
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 0 Wrestling Drama Drama Football Rugby Movie Music
2 0 Wrestling Drama Drama Football Movie News Music
3 1 NA NA NA Racing NA Movie NA
4 1 NA NA NA Racing NA Movie NA
5 2 Athletics Children Music Football Drama NA NA
6 2 Athletics Children Movie Football Drama NA NA
# ... with 4 more variables: 09/06/2021 <chr>, 11/06/2021 <chr>, 13/06/2021 <chr>, 06/06/2021 <chr>
或
library(tidyr)
library(dplyr)
df %>%
pivot_wider(
names_from= dt,
values_from = Programme_watched
) %>%
unnest() %>%
distinct()
Hour `05/06/2021` `07/06/2021` `08/06/2021` `10/06/2021` `12/06/2021` `14/06/2021` `15/06/2021` `09/06/2021` `11/06/2021` `13/06/2021` `06/06/2021`
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 0 Wrestling Drama Drama Football Rugby Movie Music NA NA NA NA
2 0 Wrestling Drama Drama Football Movie News Music NA NA NA NA
3 1 NA NA NA Racing NA Movie NA Tennis "Documentary " Movie NA
4 2 Athletics Children Music Football Drama NA NA Football "Rugby" NA News
5 2 Athletics Children Movie Football Drama NA NA News "Rugby" NA News
一个选项也是在 pivot_wider
之前按组创建序列 (rowid
) 以处理重复元素
library(data.table)
library(tidyr)
library(dplyr)
df%>%
group_by(Hour,dt)%>%
summarise(Programmes=Programme_watched, .groups = 'drop') %>%
mutate(rn = rowid(Hour, dt)) %>%
pivot_wider(names_from = dt, values_from = Programmes) %>%
select(-rn)
-输出
# A tibble: 6 × 12
Hour `05/06/2021` `07/06/2021` `08/06/2021` `10/06/2021` `12/06/2021` `14/06/2021` `15/06/2021` `09/06/2021` `11/06/2021` `13/06/2021` `06/06/2021`
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 0 Wrestling Drama Drama Football Rugby Movie Music <NA> <NA> <NA> <NA>
2 0 <NA> <NA> <NA> <NA> Movie News <NA> <NA> <NA> <NA> <NA>
3 1 <NA> <NA> <NA> Racing <NA> Movie <NA> Tennis "Documentary " Movie <NA>
4 1 <NA> <NA> <NA> Racing <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 2 Athletics Children Music Football Drama <NA> <NA> Football "Rugby" <NA> News
6 2 <NA> <NA> Movie <NA> <NA> <NA> <NA> News <NA> <NA> <NA>