排序 R,时间序列数据
Sequencing R, Time Series Data
我想在我当前的数据框中添加一个新列,它会根据足球比赛中的一系列事件添加一个新的序列号。
这是我当前的数据框
head(test_P)
index team.name possession_team.name minute second period possession type.name
1 5 Cardiff City Cardiff City 0 0 1 2 Pass
2 6 Cardiff City Cardiff City 0 2 1 2 Ball Receipt*
3 7 Cardiff City Cardiff City 0 2 1 2 Carry
4 8 Cardiff City Cardiff City 0 3 1 2 Pass
5 9 Cardiff City Cardiff City 0 6 1 2 Ball Receipt*
6 10 Preston North End Cardiff City 0 6 1 2 Duel
7 11 Preston North End Cardiff City 0 6 1 2 Pass
8 12 Preston North End Cardiff City 0 8 1 2 Miscontrol
9 13 Cardiff City Cardiff City 0 8 1 2 Pass
10 14 Cardiff City Cardiff City 0 9 1 2 Ball Receipt*
11 15 Cardiff City Cardiff City 0 9 1 2 Cross
12 16 Preston North End Cardiff City 0 10 1 2 Clearance
13 17 Cardiff City Cardiff City 0 11 1 2 Pass
14 18 Cardiff City Cardiff City 0 13 1 2 Ball Receipt*
15 19 Preston North End Preston North End 0 13 1 3 Ball Recovery
16 20 Preston North End Preston North End 0 13 1 3 Carry
17 21 Preston North End Preston North End 0 21 1 3 Pass
18 22 Preston North End Preston North End 0 22 1 3 Ball Receipt*.
但是,我想在 possession 之后添加一个名为 sequence 的附加列名称,它标记了 possession 的序号。
每个新的拥有都应该以值为 1 的序列开始
但是如果对方用 event/events 打破了这个序列并且控球值仍然相同,那么下次控球球队触球时应该是一个新的序列号,例如 2 或者如果多次中断3,4 等等
反对事件应使用与他们打破的事件相同的序列号进行编码
例如下面的数据
index team.name possession_team.name minute second period possession type.name sequence
1 5 Cardiff City Cardiff City 0 0 1 2 Pass 1
2 6 Cardiff City Cardiff City 0 2 1 2 Ball Receipt 1
3 7 Cardiff City Cardiff City 0 2 1 2 Carry 1
4 8 Cardiff City Cardiff City 0 3 1 2 Pass 1
5 9 Cardiff City Cardiff City 0 6 1 2 Ball Receipt* 1
6 10 Preston North End Cardiff City 0 6 1 2 Duel 1
7 11 Preston North End Cardiff City 0 6 1 2 Pass 1
8 12 Preston North End Cardiff City 0 8 1 2 Miscontrol 1
9 13 Cardiff City Cardiff City 0 8 1 2 Pass 2
10 14 Cardiff City Cardiff City 0 9 1 2 Ball Receipt 2
11 15 Cardiff City Cardiff City 0 9 1 2 Cross 2
12 16 Preston North End Cardiff City 0 10 1 2 Clearance 2
13 17 Cardiff City Cardiff City 0 11 1 2 Pass 3
14 18 Cardiff City Cardiff City 0 13 1 2 Ball Receipt 3
15 19 Preston North End Preston North End 0 13 1 3 Ball Recovery 1
16 20 Preston North End Preston North End 0 13 1 3 Carry 1
17 21 Preston North End Preston North End 0 21 1 3 Pass 1
18 22 Preston North End Preston North End 0 22 1 3 Ball Receipt 1
我已尝试将超前和滞后函数与 ifelse 语句结合使用,但似乎无法使数据正常工作
test <- test %>% mutate(P = ifelse(dplyr::lag(team.id)!=team.id & dplyr::lag(possession) == possession, dplyr::lag(seq_id) + 1,
ifelse(dplyr::lead(team.id)!=team.id & dplyr::lead(possession)!=possession , seq_id, 1)))
任何帮助将不胜感激,并为这个问题的不整洁表示歉意
以下内容感觉很老套,但可能会起作用。
逻辑如下:
- 生成一个
flip
变量,每次 team.name “翻转”时为 1/2,否则为 0。
- 生成
cum_sum_flip
,超过flip
的累计和。添加 1 以使其从 1 而不是 0 开始。
- 通过从
cum_sum_flip
中取出 floor()
来生成 sequence
,这样在每次翻转时,序列都会增加。
备注:
- 为了便于理解,我把中间变量留了下来,大家可以稍微巩固一下。
- 根据您的数据结构,您可能必须按
match
或其他方式分组,以确保当全新的比赛开始时,它会再次从 0 开始计数。
- 这个解决方案不是很稳健,并且对数据结构有一些假设。请检查边缘情况。
library(dplyr)
test_P %>%
mutate(flip = (lag(team.name) != team.name) %>% replace_na(0) * 1/2,
.after = possession
) %>% group_by(possession) %>%
mutate(cum_sum_flip = cumsum(flip)+1,
sequence = floor(cum_sum_flip),
.after = possession
)
结果:
# A tibble: 18 x 11
# Groups: possession [2]
index team.name possession_team.name minute second period possession cum_sum_flip sequence flip type.name
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 5 Cardiff City Cardiff City 0 0 1 2 1 1 0 Pass
2 6 Cardiff City Cardiff City 0 2 1 2 1 1 0 Ball Receipt*
3 7 Cardiff City Cardiff City 0 2 1 2 1 1 0 Carry
4 8 Cardiff City Cardiff City 0 3 1 2 1 1 0 Pass
5 9 Cardiff City Cardiff City 0 6 1 2 1 1 0 Ball Receipt*
6 10 Preston North End Cardiff City 0 6 1 2 1.5 1 0.5 Duel
7 11 Preston North End Cardiff City 0 6 1 2 1.5 1 0 Pass
8 12 Preston North End Cardiff City 0 8 1 2 1.5 1 0 Miscontrol
9 13 Cardiff City Cardiff City 0 8 1 2 2 2 0.5 Pass
10 14 Cardiff City Cardiff City 0 9 1 2 2 2 0 Ball Receipt*
11 15 Cardiff City Cardiff City 0 9 1 2 2 2 0 Cross
12 16 Preston North End Cardiff City 0 10 1 2 2.5 2 0.5 Clearance
13 17 Cardiff City Cardiff City 0 11 1 2 3 3 0.5 Pass
14 18 Cardiff City Cardiff City 0 13 1 2 3 3 0 Ball Receipt*
15 19 Preston North End Preston North End 0 13 1 3 1.5 1 0.5 Ball Recovery
16 20 Preston North End Preston North End 0 13 1 3 1.5 1 0 Carry
17 21 Preston North End Preston North End 0 21 1 3 1.5 1 0 Pass
18 22 Preston North End Preston North End 0 22 1 3 1.5 1 0 Ball Receipt*
数据
test_P <- tribble(
~index, ~team.name, ~possession_team.name, ~minute, ~second, ~period, ~possession, ~type.name,
5 , "Cardiff City", "Cardiff City", 0, 0, 1, 2, "Pass",
6 , "Cardiff City", "Cardiff City", 0, 2, 1, 2, "Ball Receipt*",
7 , "Cardiff City", "Cardiff City", 0, 2, 1, 2, "Carry",
8 , "Cardiff City", "Cardiff City", 0, 3, 1, 2, "Pass",
9 , "Cardiff City", "Cardiff City", 0, 6, 1, 2, "Ball Receipt*",
10, "Preston North End", "Cardiff City", 0, 6, 1, 2, "Duel",
11, "Preston North End", "Cardiff City", 0, 6, 1, 2, "Pass",
12, "Preston North End", "Cardiff City", 0, 8, 1, 2, "Miscontrol",
13, "Cardiff City", "Cardiff City", 0, 8, 1, 2, "Pass",
14, "Cardiff City", "Cardiff City", 0, 9, 1, 2, "Ball Receipt*",
15, "Cardiff City", "Cardiff City", 0, 9, 1, 2, "Cross",
16, "Preston North End", "Cardiff City", 0, 10, 1, 2, "Clearance",
17, "Cardiff City", "Cardiff City", 0, 11, 1, 2, "Pass",
18, "Cardiff City", "Cardiff City", 0, 13, 1, 2, "Ball Receipt*",
19, "Preston North End", "Preston North End", 0, 13, 1, 3, "Ball Recovery",
20, "Preston North End", "Preston North End", 0, 13, 1, 3, "Carry",
21, "Preston North End", "Preston North End", 0, 21, 1, 3, "Pass",
22, "Preston North End", "Preston North End", 0, 22, 1, 3, "Ball Receipt*")
我想在我当前的数据框中添加一个新列,它会根据足球比赛中的一系列事件添加一个新的序列号。
这是我当前的数据框
head(test_P)
index team.name possession_team.name minute second period possession type.name
1 5 Cardiff City Cardiff City 0 0 1 2 Pass
2 6 Cardiff City Cardiff City 0 2 1 2 Ball Receipt*
3 7 Cardiff City Cardiff City 0 2 1 2 Carry
4 8 Cardiff City Cardiff City 0 3 1 2 Pass
5 9 Cardiff City Cardiff City 0 6 1 2 Ball Receipt*
6 10 Preston North End Cardiff City 0 6 1 2 Duel
7 11 Preston North End Cardiff City 0 6 1 2 Pass
8 12 Preston North End Cardiff City 0 8 1 2 Miscontrol
9 13 Cardiff City Cardiff City 0 8 1 2 Pass
10 14 Cardiff City Cardiff City 0 9 1 2 Ball Receipt*
11 15 Cardiff City Cardiff City 0 9 1 2 Cross
12 16 Preston North End Cardiff City 0 10 1 2 Clearance
13 17 Cardiff City Cardiff City 0 11 1 2 Pass
14 18 Cardiff City Cardiff City 0 13 1 2 Ball Receipt*
15 19 Preston North End Preston North End 0 13 1 3 Ball Recovery
16 20 Preston North End Preston North End 0 13 1 3 Carry
17 21 Preston North End Preston North End 0 21 1 3 Pass
18 22 Preston North End Preston North End 0 22 1 3 Ball Receipt*.
但是,我想在 possession 之后添加一个名为 sequence 的附加列名称,它标记了 possession 的序号。
每个新的拥有都应该以值为 1 的序列开始
但是如果对方用 event/events 打破了这个序列并且控球值仍然相同,那么下次控球球队触球时应该是一个新的序列号,例如 2 或者如果多次中断3,4 等等
反对事件应使用与他们打破的事件相同的序列号进行编码
例如下面的数据
index team.name possession_team.name minute second period possession type.name sequence
1 5 Cardiff City Cardiff City 0 0 1 2 Pass 1
2 6 Cardiff City Cardiff City 0 2 1 2 Ball Receipt 1
3 7 Cardiff City Cardiff City 0 2 1 2 Carry 1
4 8 Cardiff City Cardiff City 0 3 1 2 Pass 1
5 9 Cardiff City Cardiff City 0 6 1 2 Ball Receipt* 1
6 10 Preston North End Cardiff City 0 6 1 2 Duel 1
7 11 Preston North End Cardiff City 0 6 1 2 Pass 1
8 12 Preston North End Cardiff City 0 8 1 2 Miscontrol 1
9 13 Cardiff City Cardiff City 0 8 1 2 Pass 2
10 14 Cardiff City Cardiff City 0 9 1 2 Ball Receipt 2
11 15 Cardiff City Cardiff City 0 9 1 2 Cross 2
12 16 Preston North End Cardiff City 0 10 1 2 Clearance 2
13 17 Cardiff City Cardiff City 0 11 1 2 Pass 3
14 18 Cardiff City Cardiff City 0 13 1 2 Ball Receipt 3
15 19 Preston North End Preston North End 0 13 1 3 Ball Recovery 1
16 20 Preston North End Preston North End 0 13 1 3 Carry 1
17 21 Preston North End Preston North End 0 21 1 3 Pass 1
18 22 Preston North End Preston North End 0 22 1 3 Ball Receipt 1
我已尝试将超前和滞后函数与 ifelse 语句结合使用,但似乎无法使数据正常工作
test <- test %>% mutate(P = ifelse(dplyr::lag(team.id)!=team.id & dplyr::lag(possession) == possession, dplyr::lag(seq_id) + 1,
ifelse(dplyr::lead(team.id)!=team.id & dplyr::lead(possession)!=possession , seq_id, 1)))
任何帮助将不胜感激,并为这个问题的不整洁表示歉意
以下内容感觉很老套,但可能会起作用。
逻辑如下:
- 生成一个
flip
变量,每次 team.name “翻转”时为 1/2,否则为 0。 - 生成
cum_sum_flip
,超过flip
的累计和。添加 1 以使其从 1 而不是 0 开始。 - 通过从
cum_sum_flip
中取出floor()
来生成sequence
,这样在每次翻转时,序列都会增加。
备注:
- 为了便于理解,我把中间变量留了下来,大家可以稍微巩固一下。
- 根据您的数据结构,您可能必须按
match
或其他方式分组,以确保当全新的比赛开始时,它会再次从 0 开始计数。 - 这个解决方案不是很稳健,并且对数据结构有一些假设。请检查边缘情况。
library(dplyr)
test_P %>%
mutate(flip = (lag(team.name) != team.name) %>% replace_na(0) * 1/2,
.after = possession
) %>% group_by(possession) %>%
mutate(cum_sum_flip = cumsum(flip)+1,
sequence = floor(cum_sum_flip),
.after = possession
)
结果:
# A tibble: 18 x 11
# Groups: possession [2]
index team.name possession_team.name minute second period possession cum_sum_flip sequence flip type.name
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 5 Cardiff City Cardiff City 0 0 1 2 1 1 0 Pass
2 6 Cardiff City Cardiff City 0 2 1 2 1 1 0 Ball Receipt*
3 7 Cardiff City Cardiff City 0 2 1 2 1 1 0 Carry
4 8 Cardiff City Cardiff City 0 3 1 2 1 1 0 Pass
5 9 Cardiff City Cardiff City 0 6 1 2 1 1 0 Ball Receipt*
6 10 Preston North End Cardiff City 0 6 1 2 1.5 1 0.5 Duel
7 11 Preston North End Cardiff City 0 6 1 2 1.5 1 0 Pass
8 12 Preston North End Cardiff City 0 8 1 2 1.5 1 0 Miscontrol
9 13 Cardiff City Cardiff City 0 8 1 2 2 2 0.5 Pass
10 14 Cardiff City Cardiff City 0 9 1 2 2 2 0 Ball Receipt*
11 15 Cardiff City Cardiff City 0 9 1 2 2 2 0 Cross
12 16 Preston North End Cardiff City 0 10 1 2 2.5 2 0.5 Clearance
13 17 Cardiff City Cardiff City 0 11 1 2 3 3 0.5 Pass
14 18 Cardiff City Cardiff City 0 13 1 2 3 3 0 Ball Receipt*
15 19 Preston North End Preston North End 0 13 1 3 1.5 1 0.5 Ball Recovery
16 20 Preston North End Preston North End 0 13 1 3 1.5 1 0 Carry
17 21 Preston North End Preston North End 0 21 1 3 1.5 1 0 Pass
18 22 Preston North End Preston North End 0 22 1 3 1.5 1 0 Ball Receipt*
数据
test_P <- tribble(
~index, ~team.name, ~possession_team.name, ~minute, ~second, ~period, ~possession, ~type.name,
5 , "Cardiff City", "Cardiff City", 0, 0, 1, 2, "Pass",
6 , "Cardiff City", "Cardiff City", 0, 2, 1, 2, "Ball Receipt*",
7 , "Cardiff City", "Cardiff City", 0, 2, 1, 2, "Carry",
8 , "Cardiff City", "Cardiff City", 0, 3, 1, 2, "Pass",
9 , "Cardiff City", "Cardiff City", 0, 6, 1, 2, "Ball Receipt*",
10, "Preston North End", "Cardiff City", 0, 6, 1, 2, "Duel",
11, "Preston North End", "Cardiff City", 0, 6, 1, 2, "Pass",
12, "Preston North End", "Cardiff City", 0, 8, 1, 2, "Miscontrol",
13, "Cardiff City", "Cardiff City", 0, 8, 1, 2, "Pass",
14, "Cardiff City", "Cardiff City", 0, 9, 1, 2, "Ball Receipt*",
15, "Cardiff City", "Cardiff City", 0, 9, 1, 2, "Cross",
16, "Preston North End", "Cardiff City", 0, 10, 1, 2, "Clearance",
17, "Cardiff City", "Cardiff City", 0, 11, 1, 2, "Pass",
18, "Cardiff City", "Cardiff City", 0, 13, 1, 2, "Ball Receipt*",
19, "Preston North End", "Preston North End", 0, 13, 1, 3, "Ball Recovery",
20, "Preston North End", "Preston North End", 0, 13, 1, 3, "Carry",
21, "Preston North End", "Preston North End", 0, 21, 1, 3, "Pass",
22, "Preston North End", "Preston North End", 0, 22, 1, 3, "Ball Receipt*")