如何围绕每个符号具有多个事件的变量生成 window?
How to generate a window around a variable with multiple events per Symbol?
我有问题..我的面板数据集看起来像这个没有变量“Window”的数据。现在我正在尝试创建变量“Window”,如下所示:
Symbol Date Close Time Event Window
AAPL 09/03/2020 66,542503 16:25:00 NA NA
AAPL 09/03/2020 71,334999 16:26:00 NA -4
AAPL 09/03/2020 68,857498 16:27:00 NA -3
AAPL 09/03/2020 62,057499 16:28:00 NA -2
AAPL 09/03/2020 69,4925 16:29:00 NA -1
AAPL 09/03/2020 60,552502 16:30:00 1 0
AAPL 09/03/2020 63,215 16:31:00 NA 1
AAPL 10/03/2020 61,6675 09:30:00 NA 2
AAPL 10/03/2020 61,195 09:31:00 NA 3
AAPL 10/03/2020 57,310001 09:32:00 NA 4
AAPL 10/03/2020 56,092499 09:33:00 NA NA
AAPL 15/03/2020 65,535603 15:45:00 NA NA
AAPL 15/03/2020 66,357545 15:46:00 NA NA
AAPL 15/03/2020 62,852345 15:47:00 NA -4
AAPL 15/03/2020 64,057325 15:48:00 NA -3
AAPL 16/03/2020 66,494545 09:30:00 NA -2
AAPL 16/03/2020 63,557967 09:31:00 1 -1
AAPL 16/03/2020 64,415454 09:32:00 NA 0
AAPL 16/03/2020 62,2357 09:33:00 NA 1
AAPL 16/03/2020 64,4576 09:34:00 NA 2
AAPL 16/03/2020 59,457579 09:35:00 NA 3
AAPL 16/03/2020 58,092470 09:36:00 NA 4
VISA 05/03/2020 186,960007 16:26:00 NA NA
VISA 05/03/2020 184,360001 16:27:00 NA -4
VISA 05/03/2020 171,130005 16:28:00 NA -3
VISA 05/03/2020 182,600006 16:29:00 NA -2
VISA 05/03/2020 172,949997 16:30:00 NA -1
VISA 06/03/2020 160,080002 09:32:00 1 0
VISA 06/03/2020 175,830002 09:33:00 NA 1
VISA 06/03/2020 152,009995 09:34:00 NA 2
VISA 06/03/2020 157,889999 09:35:00 NA 3
VISA 06/03/2020 148,479996 09:36:00 NA 4
VISA 06/03/2020 152,25 09:37:00 NA NA
VISA 06/03/2020 146,830002 09:38:00 NA NA
VISA 20/03/2020 192,203826 16:12:00 NA NA
VISA 20/03/2020 193,293752 16:13:00 NA -4
VISA 20/03/2020 192,204726 16:14:00 NA -3
VISA 20/03/2020 192,2396 16:15:00 NA -2
VISA 20/03/2020 194,185620 16:16:00 NA -1
VISA 20/03/2020 196,614289 16:17:00 1 0
VISA 20/03/2020 197,826200 16:18:00 NA 1
VISA 21/03/2020 197,49176 09:29:00 NA 2
VISA 21/03/2020 197,239230 09:30:00 NA 3
VISA 21/03/2020 198,2300 09:31:00 NA 4
VISA 21/03/2020 198,230028 09:32:00 NA NA
VISA 21/03/2020 197,247020 09:33:00 NA NA
我已经尝试过我发现的以下代码:
EventStudy <- EventStudy %>% group_by(Symbol) %>% mutate(Window =
row_number() - match(1, Event), Window = ifelse(abs(Window) > 4, NA, Window))
%>% ungroup
不幸的是,每个符号只给我一个 Window,但我的代码中每个符号有多个事件。因此,例如,对于符号“AAPL”,我有两个事件。
我也尝试使用没有 group_by 的代码,但它也没有按预期工作。我也没有合适的数据集分组,之后我每组只有一个事件。
对于一个交易品种的多个事件,有没有办法修改代码?你能帮我创建变量“Window”吗?
非常感谢!
这是我不使用 tidyverse 风格的罕见情况之一。我会在滞后上进行一个小的 for
循环(只有 9 次迭代):
## test data
event <- c(NA, NA, NA, NA, NA, 1 , NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1 , NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1 , NA, NA, NA, NA, NA)
event_pos <- which(event == 1)
window <- rep(NA, length(event))
for (dif in -4:4) {
window[event_pos+dif] <- dif
}
请注意,此代码不处理特殊情况,例如重叠 windows 或超出数据范围的滞后。但它可以很容易地适应处理这种情况。
结果:
event window
NA NA
NA -4
NA -3
NA -2
NA -1
1 0
NA 1
NA 2
NA 3
NA 4
NA NA
NA NA
NA -4
NA -3
NA -2
NA -1
1 0
NA 1
NA 2
NA 3
NA 4
NA NA
NA NA
NA -4
NA -3
NA -2
NA -1
1 0
NA 1
NA 2
NA 3
NA 4
NA NA
您可以轻松调整代码以处理交易品种组并处理边缘情况:
library(tidyverse)
create_window <- function(event) {
event_pos <- which(event == 1)
if (length(event_pos) == 0) {
return(rep(NA, length(event)))
}
window <- rep(NA, length(event) + 8) ## add 8 for edge cases
for (dif in -4:4) {
window[event_pos+dif+4] <- dif
}
## remove superfluous 8
window <- window[-c(1:4, (length(window):(length(window)-3)))]
window
}
testdata %>%
group_by(symbol) %>%
mutate(window = create_window(event)) %>%
ungroup()
我的 16GB RAM 和 i5-6600K 机器上 2 亿行 100k 符号和 150 万个事件的一些计时:大约需要 7.2 秒
testdata <-
tibble(event = rep(NA_real_, 200000000),
symbol = rep(1:100000, c(rmultinom(1, 200000000, rep(1/100000, 100000)))))
testdata$event[sample.int(length(testdata$event)-9, 1500000)+4] <- 1
microbenchmark::microbenchmark({
testdata %>%
group_by(symbol) %>%
mutate(window = create_window(event)) %>%
ungroup()
},
times = 10)
###
## mean median uq max neval
## 7.650121 7.201488 7.390293 10.21066 10
您可以借助辅助函数来实现此目的,以获得最接近 Event = 1
值的索引。
library(dplyr)
closest_index <- function(x, y) {
y <- which(y == 1)
y[sapply(x, function(i) which(abs(y - i) == min(abs(y - i)))[1])]
}
EventStudy %>%
group_by(Symbol) %>%
mutate(close_index = closest_index(row_number(), Event),
Window = row_number() - close_index,
Window = ifelse(abs(Window) > 4, NA, Window)) %>%
ungroup %>%
select(-close_index)
这个returns-
# Symbol Date Close Time Event Window
#1 AAPL 09/03/2020 66,542503 16:25:00 NA NA
#2 AAPL 09/03/2020 71,334999 16:26:00 NA -4
#3 AAPL 09/03/2020 68,857498 16:27:00 NA -3
#4 AAPL 09/03/2020 62,057499 16:28:00 NA -2
#5 AAPL 09/03/2020 69,4925 16:29:00 NA -1
#6 AAPL 09/03/2020 60,552502 16:30:00 1 0
#7 AAPL 09/03/2020 63,215 16:31:00 NA 1
#8 AAPL 10/03/2020 61,6675 09:30:00 NA 2
#9 AAPL 10/03/2020 61,195 09:31:00 NA 3
#10 AAPL 10/03/2020 57,310001 09:32:00 NA 4
#11 AAPL 10/03/2020 56,092499 09:33:00 NA NA
#12 AAPL 15/03/2020 65,535603 15:45:00 NA NA
#13 AAPL 15/03/2020 66,357545 15:46:00 NA -4
#14 AAPL 15/03/2020 62,852345 15:47:00 NA -3
#15 AAPL 15/03/2020 64,057325 15:48:00 NA -2
#16 AAPL 16/03/2020 66,494545 09:30:00 NA -1
#17 AAPL 16/03/2020 63,557967 09:31:00 1 0
#18 AAPL 16/03/2020 64,415454 09:32:00 NA 1
#19 AAPL 16/03/2020 62,2357 09:33:00 NA 2
#20 AAPL 16/03/2020 64,4576 09:34:00 NA 3
#21 AAPL 16/03/2020 59,457579 09:35:00 NA 4
#22 AAPL 16/03/2020 58,092470 09:36:00 NA NA
#23 VISA 05/03/2020 186,960007 16:26:00 NA NA
#24 VISA 05/03/2020 184,360001 16:27:00 NA -4
#25 VISA 05/03/2020 171,130005 16:28:00 NA -3
#26 VISA 05/03/2020 182,600006 16:29:00 NA -2
#27 VISA 05/03/2020 172,949997 16:30:00 NA -1
#28 VISA 06/03/2020 160,080002 09:32:00 1 0
#29 VISA 06/03/2020 175,830002 09:33:00 NA 1
#30 VISA 06/03/2020 152,009995 09:34:00 NA 2
#31 VISA 06/03/2020 157,889999 09:35:00 NA 3
#32 VISA 06/03/2020 148,479996 09:36:00 NA 4
#33 VISA 06/03/2020 152,25 09:37:00 NA NA
#34 VISA 06/03/2020 146,830002 09:38:00 NA NA
#35 VISA 20/03/2020 192,203826 16:12:00 NA NA
#36 VISA 20/03/2020 193,293752 16:13:00 NA -4
#37 VISA 20/03/2020 192,204726 16:14:00 NA -3
#38 VISA 20/03/2020 192,2396 16:15:00 NA -2
#39 VISA 20/03/2020 194,185620 16:16:00 NA -1
#40 VISA 20/03/2020 196,614289 16:17:00 1 0
#41 VISA 20/03/2020 197,826200 16:18:00 NA 1
#42 VISA 21/03/2020 197,49176 09:29:00 NA 2
#43 VISA 21/03/2020 197,239230 09:30:00 NA 3
#44 VISA 21/03/2020 198,2300 09:31:00 NA 4
#45 VISA 21/03/2020 198,230028 09:32:00 NA NA
#46 VISA 21/03/2020 197,247020 09:33:00 NA NA
我有问题..我的面板数据集看起来像这个没有变量“Window”的数据。现在我正在尝试创建变量“Window”,如下所示:
Symbol Date Close Time Event Window
AAPL 09/03/2020 66,542503 16:25:00 NA NA
AAPL 09/03/2020 71,334999 16:26:00 NA -4
AAPL 09/03/2020 68,857498 16:27:00 NA -3
AAPL 09/03/2020 62,057499 16:28:00 NA -2
AAPL 09/03/2020 69,4925 16:29:00 NA -1
AAPL 09/03/2020 60,552502 16:30:00 1 0
AAPL 09/03/2020 63,215 16:31:00 NA 1
AAPL 10/03/2020 61,6675 09:30:00 NA 2
AAPL 10/03/2020 61,195 09:31:00 NA 3
AAPL 10/03/2020 57,310001 09:32:00 NA 4
AAPL 10/03/2020 56,092499 09:33:00 NA NA
AAPL 15/03/2020 65,535603 15:45:00 NA NA
AAPL 15/03/2020 66,357545 15:46:00 NA NA
AAPL 15/03/2020 62,852345 15:47:00 NA -4
AAPL 15/03/2020 64,057325 15:48:00 NA -3
AAPL 16/03/2020 66,494545 09:30:00 NA -2
AAPL 16/03/2020 63,557967 09:31:00 1 -1
AAPL 16/03/2020 64,415454 09:32:00 NA 0
AAPL 16/03/2020 62,2357 09:33:00 NA 1
AAPL 16/03/2020 64,4576 09:34:00 NA 2
AAPL 16/03/2020 59,457579 09:35:00 NA 3
AAPL 16/03/2020 58,092470 09:36:00 NA 4
VISA 05/03/2020 186,960007 16:26:00 NA NA
VISA 05/03/2020 184,360001 16:27:00 NA -4
VISA 05/03/2020 171,130005 16:28:00 NA -3
VISA 05/03/2020 182,600006 16:29:00 NA -2
VISA 05/03/2020 172,949997 16:30:00 NA -1
VISA 06/03/2020 160,080002 09:32:00 1 0
VISA 06/03/2020 175,830002 09:33:00 NA 1
VISA 06/03/2020 152,009995 09:34:00 NA 2
VISA 06/03/2020 157,889999 09:35:00 NA 3
VISA 06/03/2020 148,479996 09:36:00 NA 4
VISA 06/03/2020 152,25 09:37:00 NA NA
VISA 06/03/2020 146,830002 09:38:00 NA NA
VISA 20/03/2020 192,203826 16:12:00 NA NA
VISA 20/03/2020 193,293752 16:13:00 NA -4
VISA 20/03/2020 192,204726 16:14:00 NA -3
VISA 20/03/2020 192,2396 16:15:00 NA -2
VISA 20/03/2020 194,185620 16:16:00 NA -1
VISA 20/03/2020 196,614289 16:17:00 1 0
VISA 20/03/2020 197,826200 16:18:00 NA 1
VISA 21/03/2020 197,49176 09:29:00 NA 2
VISA 21/03/2020 197,239230 09:30:00 NA 3
VISA 21/03/2020 198,2300 09:31:00 NA 4
VISA 21/03/2020 198,230028 09:32:00 NA NA
VISA 21/03/2020 197,247020 09:33:00 NA NA
我已经尝试过我发现的以下代码:
EventStudy <- EventStudy %>% group_by(Symbol) %>% mutate(Window = row_number() - match(1, Event), Window = ifelse(abs(Window) > 4, NA, Window)) %>% ungroup
不幸的是,每个符号只给我一个 Window,但我的代码中每个符号有多个事件。因此,例如,对于符号“AAPL”,我有两个事件。
我也尝试使用没有 group_by 的代码,但它也没有按预期工作。我也没有合适的数据集分组,之后我每组只有一个事件。
对于一个交易品种的多个事件,有没有办法修改代码?你能帮我创建变量“Window”吗?
非常感谢!
这是我不使用 tidyverse 风格的罕见情况之一。我会在滞后上进行一个小的 for
循环(只有 9 次迭代):
## test data
event <- c(NA, NA, NA, NA, NA, 1 , NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1 , NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1 , NA, NA, NA, NA, NA)
event_pos <- which(event == 1)
window <- rep(NA, length(event))
for (dif in -4:4) {
window[event_pos+dif] <- dif
}
请注意,此代码不处理特殊情况,例如重叠 windows 或超出数据范围的滞后。但它可以很容易地适应处理这种情况。
结果:
event window
NA NA
NA -4
NA -3
NA -2
NA -1
1 0
NA 1
NA 2
NA 3
NA 4
NA NA
NA NA
NA -4
NA -3
NA -2
NA -1
1 0
NA 1
NA 2
NA 3
NA 4
NA NA
NA NA
NA -4
NA -3
NA -2
NA -1
1 0
NA 1
NA 2
NA 3
NA 4
NA NA
您可以轻松调整代码以处理交易品种组并处理边缘情况:
library(tidyverse)
create_window <- function(event) {
event_pos <- which(event == 1)
if (length(event_pos) == 0) {
return(rep(NA, length(event)))
}
window <- rep(NA, length(event) + 8) ## add 8 for edge cases
for (dif in -4:4) {
window[event_pos+dif+4] <- dif
}
## remove superfluous 8
window <- window[-c(1:4, (length(window):(length(window)-3)))]
window
}
testdata %>%
group_by(symbol) %>%
mutate(window = create_window(event)) %>%
ungroup()
我的 16GB RAM 和 i5-6600K 机器上 2 亿行 100k 符号和 150 万个事件的一些计时:大约需要 7.2 秒
testdata <-
tibble(event = rep(NA_real_, 200000000),
symbol = rep(1:100000, c(rmultinom(1, 200000000, rep(1/100000, 100000)))))
testdata$event[sample.int(length(testdata$event)-9, 1500000)+4] <- 1
microbenchmark::microbenchmark({
testdata %>%
group_by(symbol) %>%
mutate(window = create_window(event)) %>%
ungroup()
},
times = 10)
###
## mean median uq max neval
## 7.650121 7.201488 7.390293 10.21066 10
您可以借助辅助函数来实现此目的,以获得最接近 Event = 1
值的索引。
library(dplyr)
closest_index <- function(x, y) {
y <- which(y == 1)
y[sapply(x, function(i) which(abs(y - i) == min(abs(y - i)))[1])]
}
EventStudy %>%
group_by(Symbol) %>%
mutate(close_index = closest_index(row_number(), Event),
Window = row_number() - close_index,
Window = ifelse(abs(Window) > 4, NA, Window)) %>%
ungroup %>%
select(-close_index)
这个returns-
# Symbol Date Close Time Event Window
#1 AAPL 09/03/2020 66,542503 16:25:00 NA NA
#2 AAPL 09/03/2020 71,334999 16:26:00 NA -4
#3 AAPL 09/03/2020 68,857498 16:27:00 NA -3
#4 AAPL 09/03/2020 62,057499 16:28:00 NA -2
#5 AAPL 09/03/2020 69,4925 16:29:00 NA -1
#6 AAPL 09/03/2020 60,552502 16:30:00 1 0
#7 AAPL 09/03/2020 63,215 16:31:00 NA 1
#8 AAPL 10/03/2020 61,6675 09:30:00 NA 2
#9 AAPL 10/03/2020 61,195 09:31:00 NA 3
#10 AAPL 10/03/2020 57,310001 09:32:00 NA 4
#11 AAPL 10/03/2020 56,092499 09:33:00 NA NA
#12 AAPL 15/03/2020 65,535603 15:45:00 NA NA
#13 AAPL 15/03/2020 66,357545 15:46:00 NA -4
#14 AAPL 15/03/2020 62,852345 15:47:00 NA -3
#15 AAPL 15/03/2020 64,057325 15:48:00 NA -2
#16 AAPL 16/03/2020 66,494545 09:30:00 NA -1
#17 AAPL 16/03/2020 63,557967 09:31:00 1 0
#18 AAPL 16/03/2020 64,415454 09:32:00 NA 1
#19 AAPL 16/03/2020 62,2357 09:33:00 NA 2
#20 AAPL 16/03/2020 64,4576 09:34:00 NA 3
#21 AAPL 16/03/2020 59,457579 09:35:00 NA 4
#22 AAPL 16/03/2020 58,092470 09:36:00 NA NA
#23 VISA 05/03/2020 186,960007 16:26:00 NA NA
#24 VISA 05/03/2020 184,360001 16:27:00 NA -4
#25 VISA 05/03/2020 171,130005 16:28:00 NA -3
#26 VISA 05/03/2020 182,600006 16:29:00 NA -2
#27 VISA 05/03/2020 172,949997 16:30:00 NA -1
#28 VISA 06/03/2020 160,080002 09:32:00 1 0
#29 VISA 06/03/2020 175,830002 09:33:00 NA 1
#30 VISA 06/03/2020 152,009995 09:34:00 NA 2
#31 VISA 06/03/2020 157,889999 09:35:00 NA 3
#32 VISA 06/03/2020 148,479996 09:36:00 NA 4
#33 VISA 06/03/2020 152,25 09:37:00 NA NA
#34 VISA 06/03/2020 146,830002 09:38:00 NA NA
#35 VISA 20/03/2020 192,203826 16:12:00 NA NA
#36 VISA 20/03/2020 193,293752 16:13:00 NA -4
#37 VISA 20/03/2020 192,204726 16:14:00 NA -3
#38 VISA 20/03/2020 192,2396 16:15:00 NA -2
#39 VISA 20/03/2020 194,185620 16:16:00 NA -1
#40 VISA 20/03/2020 196,614289 16:17:00 1 0
#41 VISA 20/03/2020 197,826200 16:18:00 NA 1
#42 VISA 21/03/2020 197,49176 09:29:00 NA 2
#43 VISA 21/03/2020 197,239230 09:30:00 NA 3
#44 VISA 21/03/2020 198,2300 09:31:00 NA 4
#45 VISA 21/03/2020 198,230028 09:32:00 NA NA
#46 VISA 21/03/2020 197,247020 09:33:00 NA NA