R:通过对 ID 进行分组来计算列中位数
R: Calculate the column medians by grouping the ID's
继续我之前的 ,现在我想按 ID 分组(仅适用于第 3 列)并计算列的中位数 (Point_B),然后用每个减去中位数列 (Point_B) 中的值到其各自的组。 NA 仍应返回。
注意:我希望 ID 分组仅应用于 Point_B 列而不应用于 Point_A,因为我想计算整个 Point_A 列的中位数并将其减去Point_A 中的值。
例如
ID <- c("A","A","A","B","B","B","C","C","C")
Point_A <- c(1,2,NA,1,2,3,1,2,NA)
Point_B <- c(1,2,3,NA,NA,1,1,1,3)
df <- data.frame(ID,Point_A ,Point_B)
+----+---------+---------+
| ID | Point_A | Point_B |
+----+---------+---------+
| A | 1 | 1 |
| A | 2 | 2 |
| A | NA | 3 |
| B | 1 | NA |
| B | 2 | NA |
| B | 3 | 1 |
| C | 1 | 1 |
| C | 2 | 1 |
| C | NA | 3 |
+----+---------+---------+
我之前 post 提供的解决方案计算中位数而不按 ID 分组。这是
library(dplyr)
df %>%
mutate_each(funs(median=.-median(., na.rm=TRUE)), -ID)
期望的输出
+----+---------+---------+
| ID | Point_A | Point_B |
+----+---------+---------+
| A | -1 | -1 |
| A | 0 | 0 |
| A | NA | 1 |
| B | -1 | NA |
| B | 0 | NA |
| B | 1 | 0 |
| C | -1 | 0 |
| C | 0 | 0 |
| C | NA | 2 |
+----+---------+---------+
我们如何通过 ID 分组获取 Column3 中的值?
你会想要一个 group_by
,我猜(根据@docendodiscimus 的建议):
demed <- function(x) x-median(x,na.rm=TRUE)
df %>%
mutate_each(funs(demed),Point_A) %>%
group_by(ID) %>%
mutate_each(funs(demed),Point_B)
给予
ID Point_A Point_B
1 A -1 -1
2 A 0 0
3 A NA 1
4 B -1 NA
5 B 0 NA
6 B 1 0
7 C -1 0
8 C 0 0
9 C NA 2
我更喜欢类似的 data.table
代码。它的语法需要多次写变量名,但括号要少得多:
require(data.table)
DT <- data.table(df)
DT[,Point_A:=demed(Point_A)
][,Point_B:=demed(Point_B)
,by=ID]
继续我之前的
注意:我希望 ID 分组仅应用于 Point_B 列而不应用于 Point_A,因为我想计算整个 Point_A 列的中位数并将其减去Point_A 中的值。
例如
ID <- c("A","A","A","B","B","B","C","C","C")
Point_A <- c(1,2,NA,1,2,3,1,2,NA)
Point_B <- c(1,2,3,NA,NA,1,1,1,3)
df <- data.frame(ID,Point_A ,Point_B)
+----+---------+---------+
| ID | Point_A | Point_B |
+----+---------+---------+
| A | 1 | 1 |
| A | 2 | 2 |
| A | NA | 3 |
| B | 1 | NA |
| B | 2 | NA |
| B | 3 | 1 |
| C | 1 | 1 |
| C | 2 | 1 |
| C | NA | 3 |
+----+---------+---------+
我之前 post 提供的解决方案计算中位数而不按 ID 分组。这是
library(dplyr)
df %>%
mutate_each(funs(median=.-median(., na.rm=TRUE)), -ID)
期望的输出
+----+---------+---------+
| ID | Point_A | Point_B |
+----+---------+---------+
| A | -1 | -1 |
| A | 0 | 0 |
| A | NA | 1 |
| B | -1 | NA |
| B | 0 | NA |
| B | 1 | 0 |
| C | -1 | 0 |
| C | 0 | 0 |
| C | NA | 2 |
+----+---------+---------+
我们如何通过 ID 分组获取 Column3 中的值?
你会想要一个 group_by
,我猜(根据@docendodiscimus 的建议):
demed <- function(x) x-median(x,na.rm=TRUE)
df %>%
mutate_each(funs(demed),Point_A) %>%
group_by(ID) %>%
mutate_each(funs(demed),Point_B)
给予
ID Point_A Point_B
1 A -1 -1
2 A 0 0
3 A NA 1
4 B -1 NA
5 B 0 NA
6 B 1 0
7 C -1 0
8 C 0 0
9 C NA 2
我更喜欢类似的 data.table
代码。它的语法需要多次写变量名,但括号要少得多:
require(data.table)
DT <- data.table(df)
DT[,Point_A:=demed(Point_A)
][,Point_B:=demed(Point_B)
,by=ID]