如何对具有多个分组列的列进行透视?
How to pivot columns with multiple grouping columns?
我正在处理类似于以下内容的大型面板数据集:
我希望重塑数据并添加反映 Score
和 Distance
列的值的列,用于游戏的每五个值(按 Player
和 [=15 分组) =]).所以最终的数据集看起来像:
通常,我会使用匹配命令来执行此操作:
data %>%
group_by(Player, Year) %>%
mutate(ScoreGame0 = Score[match(Game, 0)]) %>%
mutate(ScoreGame5 = Score[match(Game, 5)])
等等。但是,鉴于我的最终数据集比这大得多(涉及更多观察和重塑变量),我想知道是否有更简单的方法来做到这一点。也许可以写一个函数? (我曾经尝试过,但惨遭失败)。
初始数据:
structure(list(Player = c("Lebron James", "Lebron James", "Lebron James",
"Lebron James", "Lebron James", "Lebron James", "Lebron James",
"Lebron James", "Lebron James", "Lebron James", "Lebron James",
"Lebron James", "Steph Curry", "Steph Curry", "Steph Curry",
"Steph Curry", "Steph Curry", "Steph Curry", "Steph Curry", "Steph Curry",
"Steph Curry", "Steph Curry", "Steph Curry", "Steph Curry"),
Year = c(2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2020L,
2020L, 2020L, 2020L, 2020L, 2020L, 2021L, 2021L, 2021L, 2021L,
2021L, 2021L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L),
Game = c(0L, 1L, 2L, 3L, 4L, 5L, 0L, 1L, 2L, 3L, 4L, 5L,
0L, 1L, 2L, 3L, 4L, 5L, 0L, 1L, 2L, 3L, 4L, 5L), Score = c(32L,
23L, 33L, 27L, 29L, 27L, 44L, 22L, 33L, 54L, 65L, 12L, 45L,
48L, 44L, 51L, 65L, 76L, 76L, 67L, 56L, 44L, 21L, 32L), Distance = c(12L,
21L, 20L, 33L, 17L, 13L, 79L, 18L, 19L, 63L, 45L, 34L, 18L,
23L, 22L, 77L, 34L, 42L, 88L, 65L, 37L, 92L, 35L, 54L)), class = "data.frame", row.names = c(NA,
-24L))
这里,我们可以使用pivot_wider
和join
library(dplyr)
library(tidyr)
left_join(data2, pivot_wider(data2, names_from = 'Game',
values_from = c(Score, Distance),
names_glue = "{.value}Game{Game}"))
-输出
Player Game Score Distance ScoreGame0 ScoreGame1 ScoreGame2 ScoreGame3 ScoreGame4 ScoreGame5 ScoreGame6 ScoreGame7 ScoreGame8
1 Lebron James 0 32 12 32 23 33 27 29 27 4 88 23
2 Lebron James 1 23 21 32 23 33 27 29 27 4 88 23
3 Lebron James 2 33 20 32 23 33 27 29 27 4 88 23
4 Lebron James 3 27 33 32 23 33 27 29 27 4 88 23
5 Lebron James 4 29 17 32 23 33 27 29 27 4 88 23
6 Lebron James 5 27 13 32 23 33 27 29 27 4 88 23
7 Lebron James 6 4 34 32 23 33 27 29 27 4 88 23
8 Lebron James 7 88 33 32 23 33 27 29 27 4 88 23
9 Lebron James 8 23 29 32 23 33 27 29 27 4 88 23
10 Lebron James 9 34 48 32 23 33 27 29 27 4 88 23
11 Lebron James 10 54 60 32 23 33 27 29 27 4 88 23
12 Steph Curry 0 45 18 45 48 44 51 65 76 42 44 76
13 Steph Curry 1 48 23 45 48 44 51 65 76 42 44 76
14 Steph Curry 2 44 22 45 48 44 51 65 76 42 44 76
15 Steph Curry 3 51 77 45 48 44 51 65 76 42 44 76
16 Steph Curry 4 65 34 45 48 44 51 65 76 42 44 76
17 Steph Curry 5 76 42 45 48 44 51 65 76 42 44 76
18 Steph Curry 6 42 27 45 48 44 51 65 76 42 44 76
19 Steph Curry 7 44 41 45 48 44 51 65 76 42 44 76
20 Steph Curry 8 76 61 45 48 44 51 65 76 42 44 76
21 Steph Curry 9 40 58 45 48 44 51 65 76 42 44 76
22 Steph Curry 10 20 47 45 48 44 51 65 76 42 44 76
ScoreGame9 ScoreGame10 DistanceGame0 DistanceGame1 DistanceGame2 DistanceGame3 DistanceGame4 DistanceGame5 DistanceGame6 DistanceGame7
1 34 54 12 21 20 33 17 13 34 33
2 34 54 12 21 20 33 17 13 34 33
3 34 54 12 21 20 33 17 13 34 33
4 34 54 12 21 20 33 17 13 34 33
5 34 54 12 21 20 33 17 13 34 33
6 34 54 12 21 20 33 17 13 34 33
7 34 54 12 21 20 33 17 13 34 33
8 34 54 12 21 20 33 17 13 34 33
9 34 54 12 21 20 33 17 13 34 33
10 34 54 12 21 20 33 17 13 34 33
11 34 54 12 21 20 33 17 13 34 33
12 40 20 18 23 22 77 34 42 27 41
13 40 20 18 23 22 77 34 42 27 41
14 40 20 18 23 22 77 34 42 27 41
15 40 20 18 23 22 77 34 42 27 41
16 40 20 18 23 22 77 34 42 27 41
17 40 20 18 23 22 77 34 42 27 41
18 40 20 18 23 22 77 34 42 27 41
19 40 20 18 23 22 77 34 42 27 41
20 40 20 18 23 22 77 34 42 27 41
21 40 20 18 23 22 77 34 42 27 41
22 40 20 18 23 22 77 34 42 27 41
DistanceGame8 DistanceGame9 DistanceGame10
1 29 48 60
2 29 48 60
3 29 48 60
4 29 48 60
5 29 48 60
6 29 48 60
7 29 48 60
8 29 48 60
9 29 48 60
10 29 48 60
11 29 48 60
12 61 58 47
13 61 58 47
14 61 58 47
15 61 58 47
16 61 58 47
17 61 58 47
18 61 58 47
19 61 58 47
20 61 58 47
21 61 58 47
22 61 58 47
使用 tidyverse
library(tidyverse)
df %>%
left_join(filter(.,Game %in% c(0,5,10))%>%
pivot_wider(names_from = Game, values_from = c(Score, Distance)))
Player Game Score Distance Score_0 Score_5 Score_10 Distance_0 Distance_5 Distance_10
1 Lebron James 0 32 12 32 27 54 12 13 60
2 Lebron James 1 23 21 32 27 54 12 13 60
3 Lebron James 2 33 20 32 27 54 12 13 60
4 Lebron James 3 27 33 32 27 54 12 13 60
5 Lebron James 4 29 17 32 27 54 12 13 60
6 Lebron James 5 27 13 32 27 54 12 13 60
7 Lebron James 6 4 34 32 27 54 12 13 60
8 Lebron James 7 88 33 32 27 54 12 13 60
9 Lebron James 8 23 29 32 27 54 12 13 60
10 Lebron James 9 34 48 32 27 54 12 13 60
11 Lebron James 10 54 60 32 27 54 12 13 60
12 Steph Curry 0 45 18 45 76 20 18 42 47
13 Steph Curry 1 48 23 45 76 20 18 42 47
14 Steph Curry 2 44 22 45 76 20 18 42 47
15 Steph Curry 3 51 77 45 76 20 18 42 47
16 Steph Curry 4 65 34 45 76 20 18 42 47
17 Steph Curry 5 76 42 45 76 20 18 42 47
18 Steph Curry 6 42 27 45 76 20 18 42 47
19 Steph Curry 7 44 41 45 76 20 18 42 47
20 Steph Curry 8 76 61 45 76 20 18 42 47
21 Steph Curry 9 40 58 45 76 20 18 42 47
22 Steph Curry 10 20 47 45 76 20 18 42 47
我正在处理类似于以下内容的大型面板数据集:
我希望重塑数据并添加反映 Score
和 Distance
列的值的列,用于游戏的每五个值(按 Player
和 [=15 分组) =]).所以最终的数据集看起来像:
通常,我会使用匹配命令来执行此操作:
data %>%
group_by(Player, Year) %>%
mutate(ScoreGame0 = Score[match(Game, 0)]) %>%
mutate(ScoreGame5 = Score[match(Game, 5)])
等等。但是,鉴于我的最终数据集比这大得多(涉及更多观察和重塑变量),我想知道是否有更简单的方法来做到这一点。也许可以写一个函数? (我曾经尝试过,但惨遭失败)。
初始数据:
structure(list(Player = c("Lebron James", "Lebron James", "Lebron James",
"Lebron James", "Lebron James", "Lebron James", "Lebron James",
"Lebron James", "Lebron James", "Lebron James", "Lebron James",
"Lebron James", "Steph Curry", "Steph Curry", "Steph Curry",
"Steph Curry", "Steph Curry", "Steph Curry", "Steph Curry", "Steph Curry",
"Steph Curry", "Steph Curry", "Steph Curry", "Steph Curry"),
Year = c(2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2020L,
2020L, 2020L, 2020L, 2020L, 2020L, 2021L, 2021L, 2021L, 2021L,
2021L, 2021L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L),
Game = c(0L, 1L, 2L, 3L, 4L, 5L, 0L, 1L, 2L, 3L, 4L, 5L,
0L, 1L, 2L, 3L, 4L, 5L, 0L, 1L, 2L, 3L, 4L, 5L), Score = c(32L,
23L, 33L, 27L, 29L, 27L, 44L, 22L, 33L, 54L, 65L, 12L, 45L,
48L, 44L, 51L, 65L, 76L, 76L, 67L, 56L, 44L, 21L, 32L), Distance = c(12L,
21L, 20L, 33L, 17L, 13L, 79L, 18L, 19L, 63L, 45L, 34L, 18L,
23L, 22L, 77L, 34L, 42L, 88L, 65L, 37L, 92L, 35L, 54L)), class = "data.frame", row.names = c(NA,
-24L))
这里,我们可以使用pivot_wider
和join
library(dplyr)
library(tidyr)
left_join(data2, pivot_wider(data2, names_from = 'Game',
values_from = c(Score, Distance),
names_glue = "{.value}Game{Game}"))
-输出
Player Game Score Distance ScoreGame0 ScoreGame1 ScoreGame2 ScoreGame3 ScoreGame4 ScoreGame5 ScoreGame6 ScoreGame7 ScoreGame8
1 Lebron James 0 32 12 32 23 33 27 29 27 4 88 23
2 Lebron James 1 23 21 32 23 33 27 29 27 4 88 23
3 Lebron James 2 33 20 32 23 33 27 29 27 4 88 23
4 Lebron James 3 27 33 32 23 33 27 29 27 4 88 23
5 Lebron James 4 29 17 32 23 33 27 29 27 4 88 23
6 Lebron James 5 27 13 32 23 33 27 29 27 4 88 23
7 Lebron James 6 4 34 32 23 33 27 29 27 4 88 23
8 Lebron James 7 88 33 32 23 33 27 29 27 4 88 23
9 Lebron James 8 23 29 32 23 33 27 29 27 4 88 23
10 Lebron James 9 34 48 32 23 33 27 29 27 4 88 23
11 Lebron James 10 54 60 32 23 33 27 29 27 4 88 23
12 Steph Curry 0 45 18 45 48 44 51 65 76 42 44 76
13 Steph Curry 1 48 23 45 48 44 51 65 76 42 44 76
14 Steph Curry 2 44 22 45 48 44 51 65 76 42 44 76
15 Steph Curry 3 51 77 45 48 44 51 65 76 42 44 76
16 Steph Curry 4 65 34 45 48 44 51 65 76 42 44 76
17 Steph Curry 5 76 42 45 48 44 51 65 76 42 44 76
18 Steph Curry 6 42 27 45 48 44 51 65 76 42 44 76
19 Steph Curry 7 44 41 45 48 44 51 65 76 42 44 76
20 Steph Curry 8 76 61 45 48 44 51 65 76 42 44 76
21 Steph Curry 9 40 58 45 48 44 51 65 76 42 44 76
22 Steph Curry 10 20 47 45 48 44 51 65 76 42 44 76
ScoreGame9 ScoreGame10 DistanceGame0 DistanceGame1 DistanceGame2 DistanceGame3 DistanceGame4 DistanceGame5 DistanceGame6 DistanceGame7
1 34 54 12 21 20 33 17 13 34 33
2 34 54 12 21 20 33 17 13 34 33
3 34 54 12 21 20 33 17 13 34 33
4 34 54 12 21 20 33 17 13 34 33
5 34 54 12 21 20 33 17 13 34 33
6 34 54 12 21 20 33 17 13 34 33
7 34 54 12 21 20 33 17 13 34 33
8 34 54 12 21 20 33 17 13 34 33
9 34 54 12 21 20 33 17 13 34 33
10 34 54 12 21 20 33 17 13 34 33
11 34 54 12 21 20 33 17 13 34 33
12 40 20 18 23 22 77 34 42 27 41
13 40 20 18 23 22 77 34 42 27 41
14 40 20 18 23 22 77 34 42 27 41
15 40 20 18 23 22 77 34 42 27 41
16 40 20 18 23 22 77 34 42 27 41
17 40 20 18 23 22 77 34 42 27 41
18 40 20 18 23 22 77 34 42 27 41
19 40 20 18 23 22 77 34 42 27 41
20 40 20 18 23 22 77 34 42 27 41
21 40 20 18 23 22 77 34 42 27 41
22 40 20 18 23 22 77 34 42 27 41
DistanceGame8 DistanceGame9 DistanceGame10
1 29 48 60
2 29 48 60
3 29 48 60
4 29 48 60
5 29 48 60
6 29 48 60
7 29 48 60
8 29 48 60
9 29 48 60
10 29 48 60
11 29 48 60
12 61 58 47
13 61 58 47
14 61 58 47
15 61 58 47
16 61 58 47
17 61 58 47
18 61 58 47
19 61 58 47
20 61 58 47
21 61 58 47
22 61 58 47
使用 tidyverse
library(tidyverse)
df %>%
left_join(filter(.,Game %in% c(0,5,10))%>%
pivot_wider(names_from = Game, values_from = c(Score, Distance)))
Player Game Score Distance Score_0 Score_5 Score_10 Distance_0 Distance_5 Distance_10
1 Lebron James 0 32 12 32 27 54 12 13 60
2 Lebron James 1 23 21 32 27 54 12 13 60
3 Lebron James 2 33 20 32 27 54 12 13 60
4 Lebron James 3 27 33 32 27 54 12 13 60
5 Lebron James 4 29 17 32 27 54 12 13 60
6 Lebron James 5 27 13 32 27 54 12 13 60
7 Lebron James 6 4 34 32 27 54 12 13 60
8 Lebron James 7 88 33 32 27 54 12 13 60
9 Lebron James 8 23 29 32 27 54 12 13 60
10 Lebron James 9 34 48 32 27 54 12 13 60
11 Lebron James 10 54 60 32 27 54 12 13 60
12 Steph Curry 0 45 18 45 76 20 18 42 47
13 Steph Curry 1 48 23 45 76 20 18 42 47
14 Steph Curry 2 44 22 45 76 20 18 42 47
15 Steph Curry 3 51 77 45 76 20 18 42 47
16 Steph Curry 4 65 34 45 76 20 18 42 47
17 Steph Curry 5 76 42 45 76 20 18 42 47
18 Steph Curry 6 42 27 45 76 20 18 42 47
19 Steph Curry 7 44 41 45 76 20 18 42 47
20 Steph Curry 8 76 61 45 76 20 18 42 47
21 Steph Curry 9 40 58 45 76 20 18 42 47
22 Steph Curry 10 20 47 45 76 20 18 42 47