用滞后值 R 填充多个 NA

Fill in Multiple NAs with Lagged Values R

我正在尝试用成本列中的最新非 NA 值填充此数据框中的 NA 值。我想按城市分组——所以奥马哈的所有 NA 应该是 44.50,林肯的 NA 应该是 62.50。这是我一直在使用的代码 - 它用正确的值替换了每个组的第一个 NA(四月),但没有填充过去。

df <- df %>% 
  group_by(city) %>%
  mutate(cost = ifelse(is.na(cost), lag(cost, na.rm=TRUE), cost))

运行代码之前的数据:

year   month      city     cost
2021   January    Omaha     45.50  
2021   February   Omaha     46.75
2021   March      Omaha     44.50
2021   April      Omaha     NA
2021   May        Omaha     NA
2021   June       Omaha     NA
2021   January    Lincoln   55.25
2021   February   Lincoln   53.80
2021   March      Lincoln   62.50
2021   April      Lincoln   NA
2021   May        Lincoln   NA
2021   June       Lincoln   NA

使用:

library(tidyverse)

df %>% 
  group_by(city) %>%
  fill(cost)

# A tibble: 12 x 4
# Groups:   city [2]
    year month    city     cost
   <int> <chr>    <chr>   <dbl>
 1  2021 January  Omaha    45.5
 2  2021 February Omaha    46.8
 3  2021 March    Omaha    44.5
 4  2021 April    Omaha    44.5
 5  2021 May      Omaha    44.5
 6  2021 June     Omaha    44.5
 7  2021 January  Lincoln  55.2
 8  2021 February Lincoln  53.8
 9  2021 March    Lincoln  62.5
10  2021 April    Lincoln  62.5
11  2021 May      Lincoln  62.5
12  2021 June     Lincoln  62.5

对于您的代码,您可能希望使用 last 而不是 lag(尽管 fill 是更好的选择)。我们还需要将 cost 包装在 na.omit.

library(tidyverse)

df %>%
  group_by(city) %>%
  mutate(cost = ifelse(is.na(cost), last(na.omit(cost)), cost))

输出

    year month    city     cost
   <int> <chr>    <chr>   <dbl>
 1  2021 January  Omaha    45.5
 2  2021 February Omaha    46.8
 3  2021 March    Omaha    44.5
 4  2021 April    Omaha    44.5
 5  2021 May      Omaha    44.5
 6  2021 June     Omaha    44.5
 7  2021 January  Lincoln  55.2
 8  2021 February Lincoln  53.8
 9  2021 March    Lincoln  62.5
10  2021 April    Lincoln  62.5
11  2021 May      Lincoln  62.5
12  2021 June     Lincoln  62.5

数据

df <- structure(list(year = c(2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
2021L, 2021L, 2021L, 2021L, 2021L, 2021L), month = c("January", 
"February", "March", "April", "May", "June", "January", "February", 
"March", "April", "May", "June"), city = c("Omaha", "Omaha", 
"Omaha", "Omaha", "Omaha", "Omaha", "Lincoln", "Lincoln", "Lincoln", 
"Lincoln", "Lincoln", "Lincoln"), cost = c(45.5, 46.75, 44.5, 
NA, NA, NA, 55.25, 53.8, 62.5, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-12L))