R:如何按行用中位数替换 NA?
R: how to replace NAs with a median by rows?
按列用中位数替换 NA 是一项非常简单的任务。但是如何用行中位数替换 NA 值呢?我尝试了 matrixStats::rowMedians
但它不起作用。
样本:
tibble(
name = LETTERS[1:5],
name2 = LETTERS[9:13],
id = 1:5,
val1 = rnorm(5, 0.05, 0.5),
val2 = rnorm(5, 0, 1),
val3 = c(1, 2, NA, 7, 0.55),
val4 = c(NA, 2.33, 12, -0.444, 0)
)
# A tibble: 5 x 7
name name2 id val1 val2 val3 val4
<chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
1 A I 1 0.160 -1.62 1 NA
2 B J 2 0.194 0.345 2 2.33
3 C K 3 0.681 1.18 NA 12
4 D L 4 0.0168 -0.385 7 -0.444
5 E M 5 -0.509 -1.10 0.55 0
我试过这段代码,它给了我一个错误:
sample <- sample %>%
mutate_all(~ifelse(is.na(.), matrixStats::rowMedians(., na.rm = T), .))
Problem with `mutate()` input `val3`.
x Argument 'dim.' must be an integer vector of length two.
i Input `val3` is `(structure(function (..., .x = ..1, .y = ..2, . = ..1) ...`.
Run `rlang::last_error()` to see where the error occurred.
我了解 matrixStats::rowMedians
要我将数据转换为矩阵。但是当我将数据转换为矩阵时,我无法执行 mutate
功能。当我尝试立即实施 rowMedians
时,我收到错误消息:
sample <- matrixStats::rowMedians(sample, cols = c("val1", "val2", "val3", "val4"))
Error in matrixStats::rowMedians(sample, cols = c("val1", "val2", "val3", :
Argument 'x' must be of type logical, integer or numeric, not 'character'.
as.matrix
将我的数据从 numeric
转换为 character
。但是,在我的原始数据集上,我收到另一个错误:
Error in matrixStats::rowMedians(original_df, cols = c(val1, val2, val3, :
object 'val1' was not found
如果您想留在 tidyverse
,一种方法是重塑数据:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with('val'),
names_to = 'col') %>%
group_by(id) %>%
mutate(value = replace(value, is.na(value), median(value, na.rm = TRUE))) %>%
pivot_wider(names_from = col, values_from = value) %>%
ungroup
在 base R 中,我们可以使用 apply
:
cols <- grep('val', names(df))
df[cols] <- t(apply(df[cols], 1, function(x)
replace(x, is.na(x), median(x, na.rm = TRUE))))
使用 dplyr
和 purrr
的一个选项可能是:
df %>%
mutate(across(val1:val4,
~ if_else(is.na(.), pmap_dbl(across(val1:val4), ~ median(c(...), na.rm = TRUE)), .)))
name name2 id val1 val2 val3 val4
<chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
1 A I 1 -0.660 1.68 1 1
2 B J 2 0.145 1.04 2 2.33
3 C K 3 -1.26 2.54 2.54 12
4 D L 4 -0.788 -0.562 7 -0.444
5 E M 5 0.821 1.74 0.55 0
按列用中位数替换 NA 是一项非常简单的任务。但是如何用行中位数替换 NA 值呢?我尝试了 matrixStats::rowMedians
但它不起作用。
样本:
tibble(
name = LETTERS[1:5],
name2 = LETTERS[9:13],
id = 1:5,
val1 = rnorm(5, 0.05, 0.5),
val2 = rnorm(5, 0, 1),
val3 = c(1, 2, NA, 7, 0.55),
val4 = c(NA, 2.33, 12, -0.444, 0)
)
# A tibble: 5 x 7
name name2 id val1 val2 val3 val4
<chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
1 A I 1 0.160 -1.62 1 NA
2 B J 2 0.194 0.345 2 2.33
3 C K 3 0.681 1.18 NA 12
4 D L 4 0.0168 -0.385 7 -0.444
5 E M 5 -0.509 -1.10 0.55 0
我试过这段代码,它给了我一个错误:
sample <- sample %>%
mutate_all(~ifelse(is.na(.), matrixStats::rowMedians(., na.rm = T), .))
Problem with `mutate()` input `val3`.
x Argument 'dim.' must be an integer vector of length two.
i Input `val3` is `(structure(function (..., .x = ..1, .y = ..2, . = ..1) ...`.
Run `rlang::last_error()` to see where the error occurred.
我了解 matrixStats::rowMedians
要我将数据转换为矩阵。但是当我将数据转换为矩阵时,我无法执行 mutate
功能。当我尝试立即实施 rowMedians
时,我收到错误消息:
sample <- matrixStats::rowMedians(sample, cols = c("val1", "val2", "val3", "val4"))
Error in matrixStats::rowMedians(sample, cols = c("val1", "val2", "val3", :
Argument 'x' must be of type logical, integer or numeric, not 'character'.
as.matrix
将我的数据从 numeric
转换为 character
。但是,在我的原始数据集上,我收到另一个错误:
Error in matrixStats::rowMedians(original_df, cols = c(val1, val2, val3, :
object 'val1' was not found
如果您想留在 tidyverse
,一种方法是重塑数据:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with('val'),
names_to = 'col') %>%
group_by(id) %>%
mutate(value = replace(value, is.na(value), median(value, na.rm = TRUE))) %>%
pivot_wider(names_from = col, values_from = value) %>%
ungroup
在 base R 中,我们可以使用 apply
:
cols <- grep('val', names(df))
df[cols] <- t(apply(df[cols], 1, function(x)
replace(x, is.na(x), median(x, na.rm = TRUE))))
使用 dplyr
和 purrr
的一个选项可能是:
df %>%
mutate(across(val1:val4,
~ if_else(is.na(.), pmap_dbl(across(val1:val4), ~ median(c(...), na.rm = TRUE)), .)))
name name2 id val1 val2 val3 val4
<chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
1 A I 1 -0.660 1.68 1 1
2 B J 2 0.145 1.04 2 2.33
3 C K 3 -1.26 2.54 2.54 12
4 D L 4 -0.788 -0.562 7 -0.444
5 E M 5 0.821 1.74 0.55 0