比较。满足条件时重置序列的有效方法 ( R )
Comp. Efficent way of resetting sequence if condition met ( R )
问题:
如果满足条件(主题更改),我想重置 (1,2)
序列。
我有 for
和 if
循环可以执行此操作,但毫不奇怪,该方法非常慢。
对于更有效的方法有什么建议(例如,涉及 apply 系列)?
当前:
subj odd_even
a
a
a
b
b
b
b
c
c
c
目标:
subj odd_even
a 1
a 2
a 1
b 1
b 2
b 1
b 2
c 1
c 2
c 1
df = data.frame( subj = c("a","a","a","b","b","b","b", "c","c","c"), odd_even = "" )
这是另一种笨拙的方法:
df$odd_even <- 2L - ave(as.integer(df$s),df$s,FUN=seq_along) %% 2L
ave
在每个组内制作一个计数器。该计数器是我们正在进行的奇偶测试。
如果 subj
稍后在数据帧中再次出现,期望的行为是什么?
如果它不会发生,这里有一个 dplyr
方法:
library(dplyr)
df %>% group_by(subj) %>%
mutate(odd_even = 2 - (row_number() %% 2))
我喜欢 sequence
这个功能:
df$odd_even <- 2L - sequence(table(df$subj)) %% 2L
data.table是另一种选择:
library(data.table)
setDT(df)
df[, odd_evenDT := 2L - seq_along(.I) %% 2L, by = subj]
基准:
set.seed(42)
df <- data.frame(subj = sort(sample(as.character(1:1e4), 1e5, TRUE)))
DT <- data.table(df)
library(microbenchmark)
microbenchmark(roland1 = 2L - sequence(table(df$subj)) %% 2L,
roland2 = DT[,2L - seq_along(.I) %% 2L, by = subj],
roland3 = 2L - sequence(rle(as.integer(df$subj))$lengths) %% 2L,
jeremy = df %>% group_by(subj) %>%
mutate(odd_even = 2 - (row_number() %% 2)),
frank = 2L - ave(as.integer(df$s),df$s,FUN=seq_along) %% 2L,
flick = ave(seq_along(df$subj), df$subj, FUN=function(x) rep(c(1,2), length.out=length(x))),
times = 10, unit = "relative")
# Unit: relative
# expr min lq mean median uq max neval
# roland1 5.820459 5.754497 5.0368686 5.404110 4.0853039 4.847161 10
# roland2 1.110919 1.057952 0.9840653 1.037428 0.7939004 1.176258 10
# roland3 1.000000 1.000000 1.0000000 1.000000 1.0000000 1.000000 10
# jeremy 5.024087 4.941366 4.3491117 4.635534 3.5144515 4.277011 10
# frank 2.036816 1.944603 1.7809168 1.831937 1.6459597 1.607283 10
# flick 3.655127 3.621457 3.2453089 3.473188 2.7717947 3.198285 10
问题:
如果满足条件(主题更改),我想重置 (1,2)
序列。
我有 for
和 if
循环可以执行此操作,但毫不奇怪,该方法非常慢。
对于更有效的方法有什么建议(例如,涉及 apply 系列)?
当前:
subj odd_even
a
a
a
b
b
b
b
c
c
c
目标:
subj odd_even
a 1
a 2
a 1
b 1
b 2
b 1
b 2
c 1
c 2
c 1
df = data.frame( subj = c("a","a","a","b","b","b","b", "c","c","c"), odd_even = "" )
这是另一种笨拙的方法:
df$odd_even <- 2L - ave(as.integer(df$s),df$s,FUN=seq_along) %% 2L
ave
在每个组内制作一个计数器。该计数器是我们正在进行的奇偶测试。
如果 subj
稍后在数据帧中再次出现,期望的行为是什么?
如果它不会发生,这里有一个 dplyr
方法:
library(dplyr)
df %>% group_by(subj) %>%
mutate(odd_even = 2 - (row_number() %% 2))
我喜欢 sequence
这个功能:
df$odd_even <- 2L - sequence(table(df$subj)) %% 2L
data.table是另一种选择:
library(data.table)
setDT(df)
df[, odd_evenDT := 2L - seq_along(.I) %% 2L, by = subj]
基准:
set.seed(42)
df <- data.frame(subj = sort(sample(as.character(1:1e4), 1e5, TRUE)))
DT <- data.table(df)
library(microbenchmark)
microbenchmark(roland1 = 2L - sequence(table(df$subj)) %% 2L,
roland2 = DT[,2L - seq_along(.I) %% 2L, by = subj],
roland3 = 2L - sequence(rle(as.integer(df$subj))$lengths) %% 2L,
jeremy = df %>% group_by(subj) %>%
mutate(odd_even = 2 - (row_number() %% 2)),
frank = 2L - ave(as.integer(df$s),df$s,FUN=seq_along) %% 2L,
flick = ave(seq_along(df$subj), df$subj, FUN=function(x) rep(c(1,2), length.out=length(x))),
times = 10, unit = "relative")
# Unit: relative
# expr min lq mean median uq max neval
# roland1 5.820459 5.754497 5.0368686 5.404110 4.0853039 4.847161 10
# roland2 1.110919 1.057952 0.9840653 1.037428 0.7939004 1.176258 10
# roland3 1.000000 1.000000 1.0000000 1.000000 1.0000000 1.000000 10
# jeremy 5.024087 4.941366 4.3491117 4.635534 3.5144515 4.277011 10
# frank 2.036816 1.944603 1.7809168 1.831937 1.6459597 1.607283 10
# flick 3.655127 3.621457 3.2453089 3.473188 2.7717947 3.198285 10