R:按总和汇总包含 NA 的列值,同时按 ID 分组
R: Roll up column values containing NA's by sum while grouping by ID's
我有一个从
获得的数据框
ID <- c("A","A","A","A","B","B","B","B")
Type <- c(45,45,46,46,45,45,46,46)
Point_A <- c(10,NA,30,40,NA,80,NA,100)
Point_B <- c(NA,32,43,NA,65,11,NA,53)
df <- data.frame(ID,Type,Point_A,Point_B)
ID Type Point_A Point_B
1 A 45 10 NA
2 A 45 NA 32
3 A 46 30 43
4 A 46 40 NA
5 B 45 NA 65
6 B 45 80 11
7 B 46 NA NA
8 B 46 100 53
虽然我从中学到了 post,但我可以汇总具有 ID 和一列的数据。
我目前正在使用 sqldf 按 ID 和类型对行和分组求和。虽然这对我有用,但它在更大的数据集上非常慢。
df1 <- sqldf("SELECT ID, Type, Sum(Point_A) as Point_A, Sum(Point_A) as Point_A
FROM df
GROUP BY ID, Type")
请建议使用可以解决此问题的任何其他技术。我已经开始学习 dplyr 和 plyr 包,我发现它非常有趣但不知道如何在这里应用它。
期望的输出
ID Type Point_A Point_B
1 A 45 10 32
2 A 46 70 43
3 B 45 80 76
4 B 46 100 53
library(data.table)
DT <- as.data.table(df)
DT[, lapply(.SD, sum, na.rm=TRUE), by=list(ID, Type)]
ID Type Point_A Point_B
1: A 45 10 32
2: A 46 70 43
3: B 45 80 76
4: B 46 100 53
使用dplyr
:
df %>% group_by(ID, Type) %>% summarise_each(funs(sum(., na.rm = T)))
或
df %>%
group_by(ID, Type) %>%
summarise(Point_A = sum(Point_A, na.rm = T),
Point_B = sum(Point_B, na.rm = T))
或
f <- function(x) sum(x, na.rm = T)
df %>%
group_by(ID, Type) %>%
summarise(Point_A = f(Point_A),
Point_B = f(Point_B))
给出:
#Source: local data frame [4 x 4]
#Groups: ID
#
# ID Type Point_A Point_B
#1 A 45 10 32
#2 A 46 70 43
#3 B 45 80 76
#4 B 46 100 53
我有一个从
获得的数据框ID <- c("A","A","A","A","B","B","B","B")
Type <- c(45,45,46,46,45,45,46,46)
Point_A <- c(10,NA,30,40,NA,80,NA,100)
Point_B <- c(NA,32,43,NA,65,11,NA,53)
df <- data.frame(ID,Type,Point_A,Point_B)
ID Type Point_A Point_B
1 A 45 10 NA
2 A 45 NA 32
3 A 46 30 43
4 A 46 40 NA
5 B 45 NA 65
6 B 45 80 11
7 B 46 NA NA
8 B 46 100 53
虽然我从中学到了 post,但我可以汇总具有 ID 和一列的数据。
我目前正在使用 sqldf 按 ID 和类型对行和分组求和。虽然这对我有用,但它在更大的数据集上非常慢。
df1 <- sqldf("SELECT ID, Type, Sum(Point_A) as Point_A, Sum(Point_A) as Point_A
FROM df
GROUP BY ID, Type")
请建议使用可以解决此问题的任何其他技术。我已经开始学习 dplyr 和 plyr 包,我发现它非常有趣但不知道如何在这里应用它。
期望的输出
ID Type Point_A Point_B
1 A 45 10 32
2 A 46 70 43
3 B 45 80 76
4 B 46 100 53
library(data.table)
DT <- as.data.table(df)
DT[, lapply(.SD, sum, na.rm=TRUE), by=list(ID, Type)]
ID Type Point_A Point_B
1: A 45 10 32
2: A 46 70 43
3: B 45 80 76
4: B 46 100 53
使用dplyr
:
df %>% group_by(ID, Type) %>% summarise_each(funs(sum(., na.rm = T)))
或
df %>%
group_by(ID, Type) %>%
summarise(Point_A = sum(Point_A, na.rm = T),
Point_B = sum(Point_B, na.rm = T))
或
f <- function(x) sum(x, na.rm = T)
df %>%
group_by(ID, Type) %>%
summarise(Point_A = f(Point_A),
Point_B = f(Point_B))
给出:
#Source: local data frame [4 x 4]
#Groups: ID
#
# ID Type Point_A Point_B
#1 A 45 10 32
#2 A 46 70 43
#3 B 45 80 76
#4 B 46 100 53