排序 R,时间序列数据

Sequencing R, Time Series Data

我想在我当前的数据框中添加一个新列,它会根据足球比赛中的一系列事件添加一个新的序列号。

这是我当前的数据框

 head(test_P)
 index        team.name      possession_team.name  minute second period possession     type.name
1      5      Cardiff City         Cardiff City      0      0      1          2          Pass
2      6      Cardiff City         Cardiff City      0      2      1          2 Ball Receipt*
3      7      Cardiff City         Cardiff City      0      2      1          2         Carry
4      8      Cardiff City         Cardiff City      0      3      1          2          Pass
5      9      Cardiff City         Cardiff City      0      6      1          2 Ball Receipt*
6     10 Preston North End         Cardiff City      0      6      1          2          Duel
7     11 Preston North End         Cardiff City      0      6      1          2          Pass
8     12 Preston North End         Cardiff City      0      8      1          2 Miscontrol
9     13      Cardiff City         Cardiff City      0      8      1          2          Pass
10    14      Cardiff City         Cardiff City      0      9      1          2 Ball Receipt*
11    15      Cardiff City         Cardiff City      0      9      1          2         Cross
12    16 Preston North End         Cardiff City      0     10      1          2 Clearance
13    17      Cardiff City         Cardiff City      0     11      1          2          Pass
14    18      Cardiff City         Cardiff City      0     13      1          2 Ball Receipt*
15    19 Preston North End    Preston North End      0     13      1          3 Ball Recovery
16    20 Preston North End    Preston North End      0     13      1          3         Carry
17    21 Preston North End    Preston North End      0     21      1          3          Pass
18    22 Preston North End    Preston North End      0     22      1          3 Ball Receipt*.   

但是,我想在 possession 之后添加一个名为 sequence 的附加列名称,它标记了 possession 的序号。

每个新的拥有都应该以值为 1 的序列开始

但是如果对方用 event/events 打破了这个序列并且控球值仍然相同,那么下次控球球队触球时应该是一个新的序列号,例如 2 或者如果多次中断3,4 等等

反对事件应使用与他们打破的事件相同的序列号进行编码

例如下面的数据

   index        team.name      possession_team.name  minute second period possession type.name sequence
1      5      Cardiff City         Cardiff City      0      0      1          2          Pass         1
2      6      Cardiff City         Cardiff City      0      2      1          2 Ball Receipt          1
3      7      Cardiff City         Cardiff City      0      2      1          2         Carry         1
4      8      Cardiff City         Cardiff City      0      3      1          2          Pass         1
5      9      Cardiff City         Cardiff City      0      6      1          2 Ball Receipt*         1
6     10 Preston North End         Cardiff City      0      6      1          2          Duel         1
7     11 Preston North End         Cardiff City      0      6      1          2          Pass         1
8     12 Preston North End         Cardiff City      0      8      1          2 Miscontrol            1
9     13      Cardiff City         Cardiff City      0      8      1          2          Pass         2
10    14      Cardiff City         Cardiff City      0      9      1          2 Ball Receipt          2
11    15      Cardiff City         Cardiff City      0      9      1          2         Cross         2
12    16 Preston North End         Cardiff City      0     10      1          2 Clearance             2
13    17      Cardiff City         Cardiff City      0     11      1          2          Pass         3
14    18      Cardiff City         Cardiff City      0     13      1          2 Ball Receipt          3
15    19 Preston North End    Preston North End      0     13      1          3 Ball Recovery         1
16    20 Preston North End    Preston North End      0     13      1          3         Carry         1
17    21 Preston North End    Preston North End      0     21      1          3          Pass         1
18    22 Preston North End    Preston North End      0     22      1          3 Ball Receipt          1

我已尝试将超前和滞后函数与 ifelse 语句结合使用,但似乎无法使数据正常工作

     test <- test  %>% mutate(P = ifelse(dplyr::lag(team.id)!=team.id & dplyr::lag(possession) == possession, dplyr::lag(seq_id) + 1,
                                                      ifelse(dplyr::lead(team.id)!=team.id & dplyr::lead(possession)!=possession , seq_id, 1))) 

任何帮助将不胜感激,并为这个问题的不整洁表示歉意

以下内容感觉很老套,但可能会起作用。

逻辑如下:

  • 生成一个 flip 变量,每次 team.name “翻转”时为 1/2,否则为 0。
  • 生成cum_sum_flip,超过flip的累计和。添加 1 以使其从 1 而不是 0 开始。
  • 通过从 cum_sum_flip 中取出 floor() 来生成 sequence,这样在每次翻转时,序列都会增加。

备注:

  • 为了便于理解,我把中间变量留了下来,大家可以稍微巩固一下。
  • 根据您的数据结构,您可能必须按 match 或其他方式分组,以确保当全新的比赛开始时,它会再次从 0 开始计数。
  • 这个解决方案不是很稳健,并且对数据结构有一些假设。请检查边缘情况。
library(dplyr)

test_P %>% 
  mutate(flip = (lag(team.name) != team.name) %>% replace_na(0) * 1/2,
         .after = possession
  ) %>% group_by(possession) %>% 
  mutate(cum_sum_flip = cumsum(flip)+1, 
         sequence = floor(cum_sum_flip),
         .after = possession
  ) 

结果:

# A tibble: 18 x 11
# Groups:   possession [2]
   index team.name         possession_team.name minute second period possession cum_sum_flip sequence  flip type.name    
   <dbl> <chr>             <chr>                 <dbl>  <dbl>  <dbl>      <dbl>        <dbl>    <dbl> <dbl> <chr>        
 1     5 Cardiff City      Cardiff City              0      0      1          2          1          1   0   Pass         
 2     6 Cardiff City      Cardiff City              0      2      1          2          1          1   0   Ball Receipt*
 3     7 Cardiff City      Cardiff City              0      2      1          2          1          1   0   Carry        
 4     8 Cardiff City      Cardiff City              0      3      1          2          1          1   0   Pass         
 5     9 Cardiff City      Cardiff City              0      6      1          2          1          1   0   Ball Receipt*
 6    10 Preston North End Cardiff City              0      6      1          2          1.5        1   0.5 Duel         
 7    11 Preston North End Cardiff City              0      6      1          2          1.5        1   0   Pass         
 8    12 Preston North End Cardiff City              0      8      1          2          1.5        1   0   Miscontrol   
 9    13 Cardiff City      Cardiff City              0      8      1          2          2          2   0.5 Pass         
10    14 Cardiff City      Cardiff City              0      9      1          2          2          2   0   Ball Receipt*
11    15 Cardiff City      Cardiff City              0      9      1          2          2          2   0   Cross        
12    16 Preston North End Cardiff City              0     10      1          2          2.5        2   0.5 Clearance    
13    17 Cardiff City      Cardiff City              0     11      1          2          3          3   0.5 Pass         
14    18 Cardiff City      Cardiff City              0     13      1          2          3          3   0   Ball Receipt*
15    19 Preston North End Preston North End         0     13      1          3          1.5        1   0.5 Ball Recovery
16    20 Preston North End Preston North End         0     13      1          3          1.5        1   0   Carry        
17    21 Preston North End Preston North End         0     21      1          3          1.5        1   0   Pass         
18    22 Preston North End Preston North End         0     22      1          3          1.5        1   0   Ball Receipt*

数据

test_P <- tribble(
~index, ~team.name, ~possession_team.name, ~minute, ~second, ~period, ~possession, ~type.name, 
5 ,      "Cardiff City",  "Cardiff City",       0,        0,       1,           2,  "Pass",
6 ,      "Cardiff City",  "Cardiff City",       0,        2,       1,           2,  "Ball Receipt*",
7 ,      "Cardiff City",  "Cardiff City",       0,        2,       1,           2,  "Carry",
8 ,      "Cardiff City",  "Cardiff City",       0,        3,       1,           2,  "Pass",
9 ,      "Cardiff City",  "Cardiff City",       0,        6,       1,           2,  "Ball Receipt*",
10,  "Preston North End", "Cardiff City",       0,        6,       1,           2,  "Duel",
11,  "Preston North End", "Cardiff City",       0,        6,       1,           2,  "Pass",
12,  "Preston North End", "Cardiff City",       0,        8,       1,           2,  "Miscontrol",
13,       "Cardiff City", "Cardiff City",       0,        8,       1,           2,  "Pass",
14,       "Cardiff City", "Cardiff City",       0,        9,       1,           2,  "Ball Receipt*",
15,       "Cardiff City", "Cardiff City",       0,        9,       1,           2,  "Cross",
16,  "Preston North End", "Cardiff City",       0,       10,       1,           2,  "Clearance",
17,       "Cardiff City", "Cardiff City",       0,       11,       1,           2,  "Pass",
18,       "Cardiff City", "Cardiff City",       0,       13,       1,           2,  "Ball Receipt*",
19,  "Preston North End", "Preston North End",  0,       13,       1,           3,  "Ball Recovery",
20,  "Preston North End", "Preston North End",  0,       13,       1,           3,  "Carry",
21,  "Preston North End", "Preston North End",  0,       21,       1,           3,  "Pass",
22,  "Preston North End", "Preston North End",  0,       22,       1,           3,  "Ball Receipt*")