R: "long" 数据帧到带有 运行 指标的更长数据帧
R: "long" data frame to longer data frame with running indicator
我有以下数据框
df
id year certificate
1 2000 1
2 2003 1
3 2002 1
4 2004 1
我想把它变成一个长数据框,带有从“年”开始的指标
df_long
id year certificate
1 2000 1
1 2001 1
1 2002 1
1 2003 1
1 2004 1
2 2000 NA
2 2001 NA
2 2002 NA
2 2003 1
2 2004 1
两步,先创建所有组合
tmp=merge(
df,
expand.grid("year"=2000:2004,"id"=1:4),
all=T
)
然后填写缺失值
tmp$certificate[is.na(tmp$certificate)]=0
tmp$certificate2=ave(tmp$certificate,tmp$id,FUN=cumsum)
...
id year certificate certificate2
11 3 2000 0 0
12 3 2001 0 0
13 3 2002 1 1
14 3 2003 0 1
15 3 2004 0 1
16 4 2000 0 0
17 4 2001 0 0
18 4 2002 0 0
19 4 2003 0 0
20 4 2004 1 1
一个 tidyverse 解决方案。首先扩展 data.frame 以涵盖所有年份和 ID,然后在第一年 certificate == 1
.
之后的行中将 certificate
列标记为 1
library(tibble)
library(dplyr)
library(purrr)
df <- tribble(~id, ~year, ~certificate,
1, 2000, 1,
2, 2003, 1,
3, 2002, 1,
4, 2004, 1)
df |>
right_join(crossing(id = 1:4, year = 2000:2004)) |>
group_by(id) |>
arrange(year) |>
mutate(certificate = accumulate(certificate,
~if(is.na(.x)) .y else .x)) |>
arrange(id, year)
##> # A tibble: 20 × 3
##> # Groups: id [4]
##> id year certificate
##> <dbl> <dbl> <dbl>
##> 1 1 2000 1
##> 2 1 2001 1
##> 3 1 2002 1
##> 4 1 2003 1
##> 5 1 2004 1
##> 6 2 2000 NA
##> 7 2 2001 NA
##> 8 2 2002 NA
##> 9 2 2003 1
##> 10 2 2004 1
##> 11 3 2000 NA
##> 12 3 2001 NA
##> 13 3 2002 1
##> 14 3 2003 1
##> 15 3 2004 1
##> 16 4 2000 NA
##> 17 4 2001 NA
##> 18 4 2002 NA
##> 19 4 2003 NA
##> 20 4 2004 1
我有以下数据框
df
id year certificate
1 2000 1
2 2003 1
3 2002 1
4 2004 1
我想把它变成一个长数据框,带有从“年”开始的指标
df_long
id year certificate
1 2000 1
1 2001 1
1 2002 1
1 2003 1
1 2004 1
2 2000 NA
2 2001 NA
2 2002 NA
2 2003 1
2 2004 1
两步,先创建所有组合
tmp=merge(
df,
expand.grid("year"=2000:2004,"id"=1:4),
all=T
)
然后填写缺失值
tmp$certificate[is.na(tmp$certificate)]=0
tmp$certificate2=ave(tmp$certificate,tmp$id,FUN=cumsum)
...
id year certificate certificate2
11 3 2000 0 0
12 3 2001 0 0
13 3 2002 1 1
14 3 2003 0 1
15 3 2004 0 1
16 4 2000 0 0
17 4 2001 0 0
18 4 2002 0 0
19 4 2003 0 0
20 4 2004 1 1
一个 tidyverse 解决方案。首先扩展 data.frame 以涵盖所有年份和 ID,然后在第一年 certificate == 1
.
certificate
列标记为 1
library(tibble)
library(dplyr)
library(purrr)
df <- tribble(~id, ~year, ~certificate,
1, 2000, 1,
2, 2003, 1,
3, 2002, 1,
4, 2004, 1)
df |>
right_join(crossing(id = 1:4, year = 2000:2004)) |>
group_by(id) |>
arrange(year) |>
mutate(certificate = accumulate(certificate,
~if(is.na(.x)) .y else .x)) |>
arrange(id, year)
##> # A tibble: 20 × 3
##> # Groups: id [4]
##> id year certificate
##> <dbl> <dbl> <dbl>
##> 1 1 2000 1
##> 2 1 2001 1
##> 3 1 2002 1
##> 4 1 2003 1
##> 5 1 2004 1
##> 6 2 2000 NA
##> 7 2 2001 NA
##> 8 2 2002 NA
##> 9 2 2003 1
##> 10 2 2004 1
##> 11 3 2000 NA
##> 12 3 2001 NA
##> 13 3 2002 1
##> 14 3 2003 1
##> 15 3 2004 1
##> 16 4 2000 NA
##> 17 4 2001 NA
##> 18 4 2002 NA
##> 19 4 2003 NA
##> 20 4 2004 1