从指示范围的列中替换按年份命名的有序列中的值

Replace values in ordered columns named by year, from columns indicating range

我想使用 'StartYear' 和 'CloseYear' 指定的范围重新编码几年的列。

从这里出发的优雅方式是什么:

library(tibble); library(dplyr)

(df <- tibble(id = c(1,2,3, 4),
              `1997` = c(1,0,0, 1), 
              `1998` = c(0,1,0, 0), 
              `1999` = c(0,0,1, 0),
              `2000` = c(0, 0, 1, 1),
              StartYear = c(1998, 1997, 1998, 1998),
              CloseYear = c(1999, 1997, 2000, 1999)))
#> # A tibble: 4 x 7
#>      id `1997` `1998` `1999` `2000` StartYear CloseYear
#>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>     <dbl>     <dbl>
#> 1     1      1      0      0      0      1998      1999
#> 2     2      0      1      0      0      1997      1997
#> 3     3      0      0      1      1      1998      2000
#> 4     4      1      0      0      1      1998      1999

到这里:

(tibble(id = c(1,2,3, 4),
              `1997` = c(0, 1, 0, 0), 
              `1998` = c(1, 0, 1, 1), 
              `1999` = c(1, 0, 1, 1),
              `2000` = c(0, 0, 1, 0),
              StartYear = c(1998, 1997, 1998, 1998),
              CloseYear = c(1999, 1997, 2000, 1999)))
#> # A tibble: 4 x 7
#>      id `1997` `1998` `1999` `2000` StartYear CloseYear
#>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>     <dbl>     <dbl>
#> 1     1      0      1      1      0      1998      1999
#> 2     2      1      0      0      0      1997      1997
#> 3     3      0      1      1      1      1998      2000
#> 4     4      0      1      1      0      1998      1999

有没有使用 dplyr / dplyr::mutate 函数的合适方法?

一种可能的 tidyverse 方法。聚集、变异、传播...

library(tidyverse)
df %>% 
  gather(year, value, -id, -StartYear, -CloseYear, convert = TRUE) %>%
  mutate(value = as.integer(StartYear <= year & year <= CloseYear)) %>% 
  spread(year, value)
#> # A tibble: 4 x 7
#>      id StartYear CloseYear `1997` `1998` `1999` `2000`
#>   <dbl>     <dbl>     <dbl>  <int>  <int>  <int>  <int>
#> 1     1      1998      1999      0      1      1      0
#> 2     2      1997      1997      1      0      0      0
#> 3     3      1998      2000      0      1      1      1
#> 4     4      1998      1999      0      1      1      0

如果你也愿意data.table:

library(data.table)
dcast(
    setDT(df)[, .(StartYear, CloseYear, flag = seq(StartYear, CloseYear)), by = .(id)],
    id + StartYear + CloseYear ~ flag, fun.agg = length)

#    id StartYear CloseYear 1997 1998 1999 2000
# 1:  1      1998      1999    0    1    1    0
# 2:  2      1997      1997    1    0    0    0
# 3:  3      1998      2000    0    1    1    1
# 4:  4      1998      1999    0    1    1    0