根据最小和最大年份插入行并填充 NA

Question

我有一个数据框，每年对多个州进行多次观察。对一个州的最早观察是在 1994 年，而对大多数州的最新观察是在 2020 年。我的数据丢失了，因为大多数观察都没有从 1994 年到 2020 年。我现在想扩展我的数据框和为缺失的年份插入行。其他列应该只填满 NA。数据框如下所示：

see table

到目前为止我的方法是：

relative_FTE %>% 
  group_by(canton_id) %>%
  mutate(Earliest.year = min(year)) %>%
  select(-value, -year) %>% 
  distinct() %>%
  expand(year = Earliest.year:1994, Earliest.year) %>%
  select(-Earliest.year) %>%
  left_join(relative_FTE, by = c("canton_id", "year"))

代码运行，但我收到警告消息：

1: In Earliest.year:1994 : numerical expression has 14 elements: only the first used 2: In Earliest.year:1994 : numerical expression has 16 elements: only the first used

所以，table只保留了每个州从1994年到第一次观测那年的数据，其余的就不再使用了。有人可以帮我找到解决方案，以便我对每个州都有 1994 年到 2020 年的观察结果吗？非常感谢帮助。

亲切的问候

Answer 1

我想您希望每一对年份（从 1994 年到 2020 年）和 canton_id 都有行。我想你可以用这些对创建 full_df 然后将它与你合并 data.frame.

full_df <- list(canton_id = unique(relative_FTE$canton_id), year = 1994:2020) %>% expand.grid()
merge(relative_FTE, full_df, all = T, by = c("year","canton_id"))

根据最小和最大年份插入行并填充 NA

Insert rows based on min and max year and fill with NAs

r

insert

dataframe