根据同一列用条件填充NA

Fill NA with condition based on the same column

我正在使用 R,我需要根据同一列和另一个日期列的值在一列中填写 NA。例如,初始数据框如下所示:

df <- data.frame( ID = c("A","A","A","A","A",
                         "B","B","B","B","B",
                         "C","C","C","C","C"), 
                  Year = c(2013,2015,2019,2020,2021,
                         2001,2005,2009,2010,2016,
                         2010,2011,2014,2015,2018),
                  value = c(NA,NA,1,NA,2,NA,1,2,NA,3,1,NA,NA,2,NA))    
df
   ID Year value
1   A 2013    NA
2   A 2015    NA
3   A 2019     1
4   A 2020    NA
5   A 2021     2
6   B 2001    NA
7   B 2005     1
8   B 2009     2
9   B 2010    NA
10  B 2016     3
11  C 2010     1
12  C 2011    NA
13  C 2014    NA
14  C 2015     2
15  C 2018    NA

这是替换 NA 需要遵循的规则:

对于每个 ID,如果数据以值 = NA 开头,请选择下一个非 NA 值。如果前一年有非 NA 值,请选择该值。

为此,我尝试执行以下两个步骤:

所需的输出如下所示:

   ID Year value
1   A 2013     1
2   A 2015     1
3   A 2019     1
4   A 2020     1
5   A 2021     2
6   B 2001     1
7   B 2005     1
8   B 2009     2
9   B 2010     2
10  B 2016     3
11  C 2010     1
12  C 2011     1
13  C 2014     1
14  C 2015     2
15  C 2018     2

但是,我想不出有什么办法可以做到这一点。

这与更新后的期望输出匹配。

 df %>% 
    group_by(ID) %>% 
    tidyr::fill(value, .direction = "downup") %>% 
    ungroup()

结果

# A tibble: 15 × 3
   ID     Year value
   <chr> <dbl> <dbl>
 1 A      2013     1
 2 A      2015     1
 3 A      2019     1
 4 A      2020     1
 5 A      2021     2
 6 B      2001     1
 7 B      2005     1
 8 B      2009     2
 9 B      2010     2
10 B      2016     3
11 C      2010     1
12 C      2011     1
13 C      2014     1
14 C      2015     2
15 C      2018     2

这是Jon Springs解决方案的一步步验证:所以他的回答是正确的:

df %>% 
  group_by(ID) %>% 
  transmute(helper = value) %>% 
  arrange(helper, .by_group = TRUE) %>% 
  bind_cols(df) %>% 
  group_by(ID...1) %>% 
  mutate(jonSprings_value = ifelse(row_number()==1, helper, value)) %>% 
  tidyr::fill(jonSprings_value, .direction = "down") %>% 
  ungroup()
   ID...1 helper ID...3  Year value jonSprings_value
   <chr>   <dbl> <chr>  <dbl> <dbl>            <dbl>
 1 A           1 A       2013    NA                1
 2 A           2 A       2015    NA                1
 3 A          NA A       2019     1                1
 4 A          NA A       2020    NA                1
 5 A          NA A       2021     2                2
 6 B           1 B       2001    NA                1
 7 B           2 B       2005     1                1
 8 B           3 B       2009     2                2
 9 B          NA B       2010    NA                2
10 B          NA B       2016     3                3
11 C           1 C       2010     1                1
12 C           2 C       2011    NA                1
13 C          NA C       2014    NA                1
14 C          NA C       2015     2                2
15 C          NA C       2018    NA                2