从指示范围的列中替换按年份命名的有序列中的值
Replace values in ordered columns named by year, from columns indicating range
我想使用 'StartYear' 和 'CloseYear' 指定的范围重新编码几年的列。
从这里出发的优雅方式是什么:
library(tibble); library(dplyr)
(df <- tibble(id = c(1,2,3, 4),
`1997` = c(1,0,0, 1),
`1998` = c(0,1,0, 0),
`1999` = c(0,0,1, 0),
`2000` = c(0, 0, 1, 1),
StartYear = c(1998, 1997, 1998, 1998),
CloseYear = c(1999, 1997, 2000, 1999)))
#> # A tibble: 4 x 7
#> id `1997` `1998` `1999` `2000` StartYear CloseYear
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0 0 0 1998 1999
#> 2 2 0 1 0 0 1997 1997
#> 3 3 0 0 1 1 1998 2000
#> 4 4 1 0 0 1 1998 1999
到这里:
(tibble(id = c(1,2,3, 4),
`1997` = c(0, 1, 0, 0),
`1998` = c(1, 0, 1, 1),
`1999` = c(1, 0, 1, 1),
`2000` = c(0, 0, 1, 0),
StartYear = c(1998, 1997, 1998, 1998),
CloseYear = c(1999, 1997, 2000, 1999)))
#> # A tibble: 4 x 7
#> id `1997` `1998` `1999` `2000` StartYear CloseYear
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 1 1 0 1998 1999
#> 2 2 1 0 0 0 1997 1997
#> 3 3 0 1 1 1 1998 2000
#> 4 4 0 1 1 0 1998 1999
有没有使用 dplyr
/ dplyr::mutate
函数的合适方法?
一种可能的 tidyverse 方法。聚集、变异、传播...
library(tidyverse)
df %>%
gather(year, value, -id, -StartYear, -CloseYear, convert = TRUE) %>%
mutate(value = as.integer(StartYear <= year & year <= CloseYear)) %>%
spread(year, value)
#> # A tibble: 4 x 7
#> id StartYear CloseYear `1997` `1998` `1999` `2000`
#> <dbl> <dbl> <dbl> <int> <int> <int> <int>
#> 1 1 1998 1999 0 1 1 0
#> 2 2 1997 1997 1 0 0 0
#> 3 3 1998 2000 0 1 1 1
#> 4 4 1998 1999 0 1 1 0
如果你也愿意data.table
:
library(data.table)
dcast(
setDT(df)[, .(StartYear, CloseYear, flag = seq(StartYear, CloseYear)), by = .(id)],
id + StartYear + CloseYear ~ flag, fun.agg = length)
# id StartYear CloseYear 1997 1998 1999 2000
# 1: 1 1998 1999 0 1 1 0
# 2: 2 1997 1997 1 0 0 0
# 3: 3 1998 2000 0 1 1 1
# 4: 4 1998 1999 0 1 1 0
我想使用 'StartYear' 和 'CloseYear' 指定的范围重新编码几年的列。
从这里出发的优雅方式是什么:
library(tibble); library(dplyr)
(df <- tibble(id = c(1,2,3, 4),
`1997` = c(1,0,0, 1),
`1998` = c(0,1,0, 0),
`1999` = c(0,0,1, 0),
`2000` = c(0, 0, 1, 1),
StartYear = c(1998, 1997, 1998, 1998),
CloseYear = c(1999, 1997, 2000, 1999)))
#> # A tibble: 4 x 7
#> id `1997` `1998` `1999` `2000` StartYear CloseYear
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0 0 0 1998 1999
#> 2 2 0 1 0 0 1997 1997
#> 3 3 0 0 1 1 1998 2000
#> 4 4 1 0 0 1 1998 1999
到这里:
(tibble(id = c(1,2,3, 4),
`1997` = c(0, 1, 0, 0),
`1998` = c(1, 0, 1, 1),
`1999` = c(1, 0, 1, 1),
`2000` = c(0, 0, 1, 0),
StartYear = c(1998, 1997, 1998, 1998),
CloseYear = c(1999, 1997, 2000, 1999)))
#> # A tibble: 4 x 7
#> id `1997` `1998` `1999` `2000` StartYear CloseYear
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 1 1 0 1998 1999
#> 2 2 1 0 0 0 1997 1997
#> 3 3 0 1 1 1 1998 2000
#> 4 4 0 1 1 0 1998 1999
有没有使用 dplyr
/ dplyr::mutate
函数的合适方法?
一种可能的 tidyverse 方法。聚集、变异、传播...
library(tidyverse)
df %>%
gather(year, value, -id, -StartYear, -CloseYear, convert = TRUE) %>%
mutate(value = as.integer(StartYear <= year & year <= CloseYear)) %>%
spread(year, value)
#> # A tibble: 4 x 7
#> id StartYear CloseYear `1997` `1998` `1999` `2000`
#> <dbl> <dbl> <dbl> <int> <int> <int> <int>
#> 1 1 1998 1999 0 1 1 0
#> 2 2 1997 1997 1 0 0 0
#> 3 3 1998 2000 0 1 1 1
#> 4 4 1998 1999 0 1 1 0
如果你也愿意data.table
:
library(data.table)
dcast(
setDT(df)[, .(StartYear, CloseYear, flag = seq(StartYear, CloseYear)), by = .(id)],
id + StartYear + CloseYear ~ flag, fun.agg = length)
# id StartYear CloseYear 1997 1998 1999 2000
# 1: 1 1998 1999 0 1 1 0
# 2: 2 1997 1997 1 0 0 0
# 3: 3 1998 2000 0 1 1 1
# 4: 4 1998 1999 0 1 1 0