用滞后值 R 填充多个 NA
Fill in Multiple NAs with Lagged Values R
我正在尝试用成本列中的最新非 NA 值填充此数据框中的 NA 值。我想按城市分组——所以奥马哈的所有 NA 应该是 44.50,林肯的 NA 应该是 62.50。这是我一直在使用的代码 - 它用正确的值替换了每个组的第一个 NA(四月),但没有填充过去。
df <- df %>%
group_by(city) %>%
mutate(cost = ifelse(is.na(cost), lag(cost, na.rm=TRUE), cost))
运行代码之前的数据:
year month city cost
2021 January Omaha 45.50
2021 February Omaha 46.75
2021 March Omaha 44.50
2021 April Omaha NA
2021 May Omaha NA
2021 June Omaha NA
2021 January Lincoln 55.25
2021 February Lincoln 53.80
2021 March Lincoln 62.50
2021 April Lincoln NA
2021 May Lincoln NA
2021 June Lincoln NA
使用:
library(tidyverse)
df %>%
group_by(city) %>%
fill(cost)
# A tibble: 12 x 4
# Groups: city [2]
year month city cost
<int> <chr> <chr> <dbl>
1 2021 January Omaha 45.5
2 2021 February Omaha 46.8
3 2021 March Omaha 44.5
4 2021 April Omaha 44.5
5 2021 May Omaha 44.5
6 2021 June Omaha 44.5
7 2021 January Lincoln 55.2
8 2021 February Lincoln 53.8
9 2021 March Lincoln 62.5
10 2021 April Lincoln 62.5
11 2021 May Lincoln 62.5
12 2021 June Lincoln 62.5
对于您的代码,您可能希望使用 last
而不是 lag
(尽管 fill
是更好的选择)。我们还需要将 cost
包装在 na.omit
.
中
library(tidyverse)
df %>%
group_by(city) %>%
mutate(cost = ifelse(is.na(cost), last(na.omit(cost)), cost))
输出
year month city cost
<int> <chr> <chr> <dbl>
1 2021 January Omaha 45.5
2 2021 February Omaha 46.8
3 2021 March Omaha 44.5
4 2021 April Omaha 44.5
5 2021 May Omaha 44.5
6 2021 June Omaha 44.5
7 2021 January Lincoln 55.2
8 2021 February Lincoln 53.8
9 2021 March Lincoln 62.5
10 2021 April Lincoln 62.5
11 2021 May Lincoln 62.5
12 2021 June Lincoln 62.5
数据
df <- structure(list(year = c(2021L, 2021L, 2021L, 2021L, 2021L, 2021L,
2021L, 2021L, 2021L, 2021L, 2021L, 2021L), month = c("January",
"February", "March", "April", "May", "June", "January", "February",
"March", "April", "May", "June"), city = c("Omaha", "Omaha",
"Omaha", "Omaha", "Omaha", "Omaha", "Lincoln", "Lincoln", "Lincoln",
"Lincoln", "Lincoln", "Lincoln"), cost = c(45.5, 46.75, 44.5,
NA, NA, NA, 55.25, 53.8, 62.5, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-12L))
我正在尝试用成本列中的最新非 NA 值填充此数据框中的 NA 值。我想按城市分组——所以奥马哈的所有 NA 应该是 44.50,林肯的 NA 应该是 62.50。这是我一直在使用的代码 - 它用正确的值替换了每个组的第一个 NA(四月),但没有填充过去。
df <- df %>%
group_by(city) %>%
mutate(cost = ifelse(is.na(cost), lag(cost, na.rm=TRUE), cost))
运行代码之前的数据:
year month city cost
2021 January Omaha 45.50
2021 February Omaha 46.75
2021 March Omaha 44.50
2021 April Omaha NA
2021 May Omaha NA
2021 June Omaha NA
2021 January Lincoln 55.25
2021 February Lincoln 53.80
2021 March Lincoln 62.50
2021 April Lincoln NA
2021 May Lincoln NA
2021 June Lincoln NA
使用:
library(tidyverse)
df %>%
group_by(city) %>%
fill(cost)
# A tibble: 12 x 4
# Groups: city [2]
year month city cost
<int> <chr> <chr> <dbl>
1 2021 January Omaha 45.5
2 2021 February Omaha 46.8
3 2021 March Omaha 44.5
4 2021 April Omaha 44.5
5 2021 May Omaha 44.5
6 2021 June Omaha 44.5
7 2021 January Lincoln 55.2
8 2021 February Lincoln 53.8
9 2021 March Lincoln 62.5
10 2021 April Lincoln 62.5
11 2021 May Lincoln 62.5
12 2021 June Lincoln 62.5
对于您的代码,您可能希望使用 last
而不是 lag
(尽管 fill
是更好的选择)。我们还需要将 cost
包装在 na.omit
.
library(tidyverse)
df %>%
group_by(city) %>%
mutate(cost = ifelse(is.na(cost), last(na.omit(cost)), cost))
输出
year month city cost
<int> <chr> <chr> <dbl>
1 2021 January Omaha 45.5
2 2021 February Omaha 46.8
3 2021 March Omaha 44.5
4 2021 April Omaha 44.5
5 2021 May Omaha 44.5
6 2021 June Omaha 44.5
7 2021 January Lincoln 55.2
8 2021 February Lincoln 53.8
9 2021 March Lincoln 62.5
10 2021 April Lincoln 62.5
11 2021 May Lincoln 62.5
12 2021 June Lincoln 62.5
数据
df <- structure(list(year = c(2021L, 2021L, 2021L, 2021L, 2021L, 2021L,
2021L, 2021L, 2021L, 2021L, 2021L, 2021L), month = c("January",
"February", "March", "April", "May", "June", "January", "February",
"March", "April", "May", "June"), city = c("Omaha", "Omaha",
"Omaha", "Omaha", "Omaha", "Omaha", "Lincoln", "Lincoln", "Lincoln",
"Lincoln", "Lincoln", "Lincoln"), cost = c(45.5, 46.75, 44.5,
NA, NA, NA, 55.25, 53.8, 62.5, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-12L))