如何用前几列的最后一个非缺失值填充 NA?
How to fill NA with last non-missing value from previous columns?
我的 df 包含一个包含所有缺失值的列 (V5):
> df
# A tibble: 7 × 5
V1 V2 V3 V4 V5
<dbl> <dbl> <dbl> <dbl> <lgl>
1 1.19 2.45 0.83 0.87 NA
2 1.13 0.79 0.68 5.43 NA
3 1.18 1.09 1.04 NA NA
4 1.11 1.1 4.24 NA NA
5 1.16 1.13 NA NA NA
6 1.18 NA NA NA NA
7 1.44 NA 9.17 NA NA
我想用前面各列中最接近的非缺失值填充 V5 列:
> df1
# A tibble: 7 × 5
V1 V2 V3 V4 V5
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1.19 2.45 0.83 0.87 0.87
2 1.13 0.79 0.68 5.43 5.43
3 1.18 1.09 1.04 NA 1.04
4 1.11 1.1 4.24 NA 4.24
5 1.16 1.13 NA NA 1.13
6 1.18 NA NA NA 1.18
7 1.44 NA 9.17 NA 9.17
有 个帖子,但是 none 正在帮助解决这个问题。所以任何线索将不胜感激。
这是输出:
structure(list(V1 = c(1.19, 1.13, 1.18, 1.11, 1.16, 1.18, 1.44
), V2 = c(2.45, 0.79, 1.09, 1.1, 1.13, NA, NA), V3 = c(0.83,
0.68, 1.04, 4.24, NA, NA, 9.17), V4 = c(0.87, 5.43, NA, NA, NA,
NA, NA), V5 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_)), row.names = c(NA,
-7L), class = c("tbl_df", "tbl", "data.frame"))
你可以使用
library(dplyr)
df %>%
mutate(V5 = coalesce(V4, V3, V2, V1))
这个returns
# A tibble: 7 x 5
V1 V2 V3 V4 V5
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1.19 2.45 0.83 0.87 0.87
2 1.13 0.79 0.68 5.43 5.43
3 1.18 1.09 1.04 NA 1.04
4 1.11 1.1 4.24 NA 4.24
5 1.16 1.13 NA NA 1.13
6 1.18 NA NA NA 1.18
7 1.44 NA 9.17 NA 9.17
或 https://github.com/tidyverse/funs/issues/54#issuecomment-892377998
更一般
df %>%
mutate(V5 = do.call(coalesce, rev(across(-V5))))
或https://github.com/tidyverse/funs/issues/54#issuecomment-1096449488
df %>%
mutate(V5 = coalesce(!!!rev(select(., -V5))))
你也可以试试这个,但是,另一个解决方案更优雅,当然最推荐:
library(dplyr)
df %>%
rowwise() %>%
mutate(V5 = last(c_across(V1:V4)[!is.na(c_across(V1:V4))]))
# A tibble: 7 x 5
# Rowwise:
V1 V2 V3 V4 V5
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1.19 2.45 0.83 0.87 0.87
2 1.13 0.79 0.68 5.43 5.43
3 1.18 1.09 1.04 NA 1.04
4 1.11 1.1 4.24 NA 4.24
5 1.16 1.13 NA NA 1.13
6 1.18 NA NA NA 1.18
7 1.44 NA 9.17 NA 9.17
使用base R
df1$V5 <- as.data.frame(df1[1:4])[cbind(seq_len(nrow(df1)),
max.col(!is.na(df1), "last"))]
df1$V5
[1] 0.87 5.43 1.04 4.24 1.13 1.18 9.17
我的 df 包含一个包含所有缺失值的列 (V5):
> df
# A tibble: 7 × 5
V1 V2 V3 V4 V5
<dbl> <dbl> <dbl> <dbl> <lgl>
1 1.19 2.45 0.83 0.87 NA
2 1.13 0.79 0.68 5.43 NA
3 1.18 1.09 1.04 NA NA
4 1.11 1.1 4.24 NA NA
5 1.16 1.13 NA NA NA
6 1.18 NA NA NA NA
7 1.44 NA 9.17 NA NA
我想用前面各列中最接近的非缺失值填充 V5 列:
> df1
# A tibble: 7 × 5
V1 V2 V3 V4 V5
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1.19 2.45 0.83 0.87 0.87
2 1.13 0.79 0.68 5.43 5.43
3 1.18 1.09 1.04 NA 1.04
4 1.11 1.1 4.24 NA 4.24
5 1.16 1.13 NA NA 1.13
6 1.18 NA NA NA 1.18
7 1.44 NA 9.17 NA 9.17
有
这是输出:
structure(list(V1 = c(1.19, 1.13, 1.18, 1.11, 1.16, 1.18, 1.44
), V2 = c(2.45, 0.79, 1.09, 1.1, 1.13, NA, NA), V3 = c(0.83,
0.68, 1.04, 4.24, NA, NA, 9.17), V4 = c(0.87, 5.43, NA, NA, NA,
NA, NA), V5 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_)), row.names = c(NA,
-7L), class = c("tbl_df", "tbl", "data.frame"))
你可以使用
library(dplyr)
df %>%
mutate(V5 = coalesce(V4, V3, V2, V1))
这个returns
# A tibble: 7 x 5
V1 V2 V3 V4 V5
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1.19 2.45 0.83 0.87 0.87
2 1.13 0.79 0.68 5.43 5.43
3 1.18 1.09 1.04 NA 1.04
4 1.11 1.1 4.24 NA 4.24
5 1.16 1.13 NA NA 1.13
6 1.18 NA NA NA 1.18
7 1.44 NA 9.17 NA 9.17
或 https://github.com/tidyverse/funs/issues/54#issuecomment-892377998
更一般df %>%
mutate(V5 = do.call(coalesce, rev(across(-V5))))
或https://github.com/tidyverse/funs/issues/54#issuecomment-1096449488
df %>%
mutate(V5 = coalesce(!!!rev(select(., -V5))))
你也可以试试这个,但是,另一个解决方案更优雅,当然最推荐:
library(dplyr)
df %>%
rowwise() %>%
mutate(V5 = last(c_across(V1:V4)[!is.na(c_across(V1:V4))]))
# A tibble: 7 x 5
# Rowwise:
V1 V2 V3 V4 V5
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1.19 2.45 0.83 0.87 0.87
2 1.13 0.79 0.68 5.43 5.43
3 1.18 1.09 1.04 NA 1.04
4 1.11 1.1 4.24 NA 4.24
5 1.16 1.13 NA NA 1.13
6 1.18 NA NA NA 1.18
7 1.44 NA 9.17 NA 9.17
使用base R
df1$V5 <- as.data.frame(df1[1:4])[cbind(seq_len(nrow(df1)),
max.col(!is.na(df1), "last"))]
df1$V5
[1] 0.87 5.43 1.04 4.24 1.13 1.18 9.17