用 NA 计算列中位数
Calculate column medians with NA's
我正在尝试计算 R 中各个列的中值,然后用列中的每个值减去中值。我在这里面临的问题是我的专栏中有 N/A 个我不想删除但只是 return 个而不减去中位数。例如
ID <- c("A","B","C","D","E")
Point_A <- c(1, NA, 3, NA, 5)
Point_B <- c(NA, NA, 1, 3, 2)
df <- data.frame(ID,Point_A ,Point_B)
是否可以计算具有 N/A 的列的中位数?我的结果输出将是
+----+---------+---------+
| ID | Point_A | Point_B |
+----+---------+---------+
| A | -2 | NA |
| B | NA | NA |
| C | 0 | -1 |
| D | NA | 1 |
| E | 2 | 0 |
+----+---------+---------+
如果我们谈论真正的 NA
值(根据 OP 的评论),可以做到
df[-1] <- lapply(df[-1], function(x) x - median(x, na.rm = TRUE))
df
# ID Point_A Point_B
# 1 A -2 NA
# 2 B NA NA
# 3 C 0 -1
# 4 D NA 1
# 5 E 2 0
或者使用 matrixStats
包
library(matrixStats)
df[-1] <- df[-1] - colMedians(as.matrix(df[-1]), na.rm = TRUE)
当原来的df
是
df <- structure(list(ID = structure(1:5, .Label = c("A", "B", "C",
"D", "E"), class = "factor"), Point_A = c(1, NA, 3, NA, 5), Point_B = c(NA,
NA, 1, 3, 2)), .Names = c("ID", "Point_A", "Point_B"), row.names = c(NA,
-5L), class = "data.frame")
当然可以。
median(df[,]$Point_A, na.rm = TRUE)
其中 df 是数据框,而 df[] 表示所有行和列。但是,请注意之后由 $Point_A 指定的列。同样的可以写成这个符号:
median(df[,"Point_A"], na.rm = TRUE)
再次,df["Point_A"] 表示 Point_A.
列的所有行
另一种选择是
library(dplyr)
df %>%
mutate_each(funs(median=.-median(., na.rm=TRUE)), -ID)
我正在尝试计算 R 中各个列的中值,然后用列中的每个值减去中值。我在这里面临的问题是我的专栏中有 N/A 个我不想删除但只是 return 个而不减去中位数。例如
ID <- c("A","B","C","D","E")
Point_A <- c(1, NA, 3, NA, 5)
Point_B <- c(NA, NA, 1, 3, 2)
df <- data.frame(ID,Point_A ,Point_B)
是否可以计算具有 N/A 的列的中位数?我的结果输出将是
+----+---------+---------+
| ID | Point_A | Point_B |
+----+---------+---------+
| A | -2 | NA |
| B | NA | NA |
| C | 0 | -1 |
| D | NA | 1 |
| E | 2 | 0 |
+----+---------+---------+
如果我们谈论真正的 NA
值(根据 OP 的评论),可以做到
df[-1] <- lapply(df[-1], function(x) x - median(x, na.rm = TRUE))
df
# ID Point_A Point_B
# 1 A -2 NA
# 2 B NA NA
# 3 C 0 -1
# 4 D NA 1
# 5 E 2 0
或者使用 matrixStats
包
library(matrixStats)
df[-1] <- df[-1] - colMedians(as.matrix(df[-1]), na.rm = TRUE)
当原来的df
是
df <- structure(list(ID = structure(1:5, .Label = c("A", "B", "C",
"D", "E"), class = "factor"), Point_A = c(1, NA, 3, NA, 5), Point_B = c(NA,
NA, 1, 3, 2)), .Names = c("ID", "Point_A", "Point_B"), row.names = c(NA,
-5L), class = "data.frame")
当然可以。
median(df[,]$Point_A, na.rm = TRUE)
其中 df 是数据框,而 df[] 表示所有行和列。但是,请注意之后由 $Point_A 指定的列。同样的可以写成这个符号:
median(df[,"Point_A"], na.rm = TRUE)
再次,df["Point_A"] 表示 Point_A.
列的所有行另一种选择是
library(dplyr)
df %>%
mutate_each(funs(median=.-median(., na.rm=TRUE)), -ID)