在 R 中将数据框旋转两次,然后恢复到原始形状

Pivot data frame to longer twice in R then back to original shape

假设我想知道四个篮球运动员中哪一个是最好的,我设置了一个小型锦标赛,两名球员进行 1 对 1 比赛,我记录了一组统计数据

#rm(list=ls())

set.seed(1234)

# some made up scores from my tournament
df <- data.frame(
  player1 = c("a", "a", "b", "c", "d", "d"),
  player2 = c("b", "c", "d", "b", "a", "c"),
  date = c("2021-01-01", "2021-01-02", "2021-01-04", "2021-01-05", "2021-01-06", "2021-01-08"),
  p1_dunks = sample(c(4:11), 6, replace = TRUE),
  p2_dunks = sample(c(3:12), 6, replace = TRUE),
  p1_blocks = sample(c(8:10), 6, replace = TRUE),
  p2_blocks = sample(c(10:12), 6, replace = TRUE),
  p1_threepointers = sample(c(2:7), 6, replace = TRUE),
  p2_threepointers = sample(c(1:5), 6, replace = TRUE)
)

为了计算一名球员在锦标赛的任何时候表现如何,我可以将其旋转两次,并将每个统计数据的计数替换为每个计数的累计总和

# cast to long and get cumulative stats per player
melted_df <- df %>%
  pivot_longer(cols = starts_with(c("p1", "p2")), names_to = "stat", values_to = "number") %>%
  pivot_longer(cols = starts_with("player"), names_to = "player", values_to = "name") %>%
  filter(
    (player == "player1" & grepl("^p1", stat)) |
      (player == "player2" & grepl("^p2", stat))
  ) %>%
  arrange(date) %>%
  group_by(player, stat) %>%
  mutate(number = cumsum(number))

然后我可以很容易地查询这个

melted_df %>%
    filter(date < "2021-01-05") %>%
    filter(!duplicated(name)) %>%
    filter(grepl("dunks$", stat))

但是对于我的用例来说,我需要将这种长格式数据强制转换回其原始形式(播放器 1、播放器 2,然后是每个播放器 1 和播放器 2 的统计数据)。我可以试试

# try to cast back to original format
back_to_wider_df <- melted_df %>%
  pivot_wider(names_from = "player", values_from = "name") %>%
  pivot_wider(names_from = "stat", values_from = "number")

但这反而给出了一个数据框,每个匹配项 'offset' 一行半满 NA 值:

> head(back_to_wider_df)
# A tibble: 6 × 9
  date       player1 player2 p1_dunks p1_blocks p1_threepointers p2_dunks p2_blocks p2_threepointers
  <chr>      <chr>   <chr>      <int>     <int>            <int>    <int>     <int>            <int>
1 2021-01-01 a       NA             7         9                6       NA        NA               NA
2 2021-01-01 NA      b             NA        NA               NA       11        11                4
3 2021-01-02 a       NA            18        18                9       NA        NA               NA
4 2021-01-02 NA      c             NA        NA               NA       18        22                8
5 2021-01-04 b       NA            23        27               15       NA        NA               NA
6 2021-01-04 NA      d             NA        NA               NA       26        32               11

是否有一种简单的方法可以将其修复回原始形状,以便前三行应显示为:

> df
        date player1 player2 p1_dunks p1_blocks p1_threepointers p2_dunks p2_blocks p2_three_pointers
1 2021-01-01       a       b        7         9                6       11        11                 4
2 2021-01-02       a       c       18        18                9       18        22                 8
3 2021-01-04       b       d       23        27               15       26        32                11

谢谢,

一种方法是使用 lead 函数并删除 NA

library(dplyr)

df %>% 
    mutate(across(c(player2, p2_dunks, p2_blocks, p2_threepointers), lead)) %>% 
    na.omit()
        date player1 player2 p1_dunks p1_blocks p1_threepointers p2_dunks p2_blocks p2_threepointers
1 2021-01-01       a       b        7         9                6       11        11                4
3 2021-01-02       a       c       18        18                9       18        22                8
5 2021-01-04       b       d       23        27               15       26        32               11