将横截面数据合并到没有 NA 行的面板数据

Question

我有 2005 年到 2020 年的 15 个数据表，如下所示：

DT_2005 = data.table(
  ID = c("1","2","3","4","5","6"),
  year = c("2005,"2005","2005","2005","2005","2005")
  score = c("98","89","101","78","97","86")
)

# Data tables for every year...

DT_2020 = data.table(
  ID = c("1","2","4","6","7","8"),
  year = c("2020,"2020","2020","2020","2020","2020")
  score = c("89","79","110","98","74","88")
)

# DT_2020 output
ID, year, score
1, 2020, 89
2, 2020, 79
4, 2020, 110
6, 2020, 98
7, 2020, 74
8, 2020, 88

即有些ID在某些年没有出现。

我想将表格组合成这样的“长”格式：

ID, year, score
1, 2005, 98
1, 2006, 95
1, 2007, 97
...
1, 2019, 90
1, 2020, 89
2, 2005, 79
2, 2006, 81
...
2, 2019, 83
2, 2020, 79

有没有办法在 data.table 中做到这一点，这样每一行都是 ID，年份按升序排列，[=13= 没有 NA 行]不是某年的吗？

Answer 1

您可以将全局环境中的所有数据帧组合到一个组合数据帧中，并对结果进行排序。

library(data.table)
dt <- rbindlist(mget(paste0('DT_', c(2005:2020))))
dt <- dt[order(ID)]

等价的 dplyr 和基础 R 替代品是 -

#dplyr
library(dplyr)
res <- bind_rows(mget(paste0('DT_', c(2005:2020)))) %>% arrange(ID)


#Base R
res <- do.call(rbind, mget(paste0('DT_', c(2005:2020))))
res <- res[order(res$ID), ]

将横截面数据合并到没有 NA 行的面板数据

Merging cross-sectional data to panel data without NA rows

merge

r

panel-data

data.table

longitudinal