使用聚合函数以特定方式处理 NA
Process NA in a specific way with aggregate function
我有一个如下所示的数据框:
Project Week Number
Project1 01 46.0
Project2 01 46.4
Project3 01 105.0
Project1 02 70.0
Project2 02 84.0
Project3 02 34.8
Project1 03 83.0
Project3 03 37.9
编辑:
> dput(my.df)
structure(list(Project = structure(c(1L, 2L, 3L, 1L, 2L, 3L,
1L, 3L), .Label = c("Project1", "Project2", "Project3"), class = "factor"),
Week = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), Number = c(46,
46.4, 105, 70, 84, 34.8, 83, 37.9)), .Names = c("Project",
"Week", "Number"), class = "data.frame", row.names = c(NA, -8L
))
我想计算每个项目每周的总和。
所以我使用聚合函数:
aggregate(Number ~ Project + Week, data = my.df, sum)
如您所见,第 3 周的 Project2 没有任何价值。
使用聚合函数只是将其留空。
我想要的是用0填充该行。
我试过了:
aggregate(Number ~ Project + Week, data = my.df, sum, na.action = 0)
和
aggregate(Number ~ Project + Week, data = my.df, sum, na.action = function(x) 0)
但是 none 有效。
有什么想法吗?
您可以使用 xtabs()
:
my.df <- read.table(header=TRUE, text=
'Project Week Number
Project1 01 46.0
Project2 01 46.4
Project3 01 105.0
Project1 02 70.0
Project2 02 84.0
Project3 02 34.8
Project1 03 83.0
Project3 03 37.9')
my.df$Week <- paste0("0", my.df$Week)
xtabs(Number ~ Project+Week, data=my.df)
# Week
# Project 01 02 03
# Project1 46.0 70.0 83.0
# Project2 46.4 84.0 0.0
# Project3 105.0 34.8 37.9
as.data.frame(xtabs(Number ~ Project+Week, data=my.df))
# Project Week Freq
# 1 Project1 01 46.0
# 2 Project2 01 46.4
# 3 Project3 01 105.0
# 4 Project1 02 70.0
# 5 Project2 02 84.0
# 6 Project3 02 34.8
# 7 Project1 03 83.0
# 8 Project2 03 0.0
# 9 Project3 03 37.9
我们也可以使用tidyr
包中的complete
函数,将Project2
的值填入Week 3
。之后,我们就可以聚合数据了。
library(tidyr)
my.df2 <- my.df %>%
complete(Project, Week, fill = list(Number = 0))
my.df2
# # A tibble: 9 x 3
# Project Week Number
# <chr> <chr> <dbl>
# 1 Project1 01 46.0
# 2 Project1 02 70.0
# 3 Project1 03 83.0
# 4 Project2 01 46.4
# 5 Project2 02 84.0
# 6 Project2 03 0.0
# 7 Project3 01 105.0
# 8 Project3 02 34.8
# 9 Project3 03 37.9
数据
my.df <- read.table(text = "Project Week Number
Project1 '01' 46.0
Project2 01 46.4
Project3 01 105.0
Project1 02 70.0
Project2 02 84.0
Project3 02 34.8
Project1 03 83.0
Project3 03 37.9",
header = TRUE, stringsAsFactors = FALSE)
my.df$Week <- paste0("0", my.df$Week)
或者您可以使用 tidyr
中的 spread
和 fill = 0
aggregate(Number ~ Project + Week, data = my.df, sum) %>%
spread(key = Week,value = Number,fill = 0)
然后使用 gather 将其恢复为原始形式
aggregate(Number ~ Project + Week, data = my.df, sum) %>%
spread(key = Week,value = Number,fill = 0) %>%
gather(key = Week, value = Number,`1`,`2`,`3`)
您可以在 base R 中执行此操作,它几乎是 tidyr::complete
的代码在 base R 中的翻译(请参阅@www 的回答)。
df <- merge(
setNames(expand.grid(unique(df$Project),unique(df$Week)),c("Project","Week")),
df, all.x=TRUE)
df$Number[is.na(df$Number)] <- 0
我有一个如下所示的数据框:
Project Week Number
Project1 01 46.0
Project2 01 46.4
Project3 01 105.0
Project1 02 70.0
Project2 02 84.0
Project3 02 34.8
Project1 03 83.0
Project3 03 37.9
编辑:
> dput(my.df)
structure(list(Project = structure(c(1L, 2L, 3L, 1L, 2L, 3L,
1L, 3L), .Label = c("Project1", "Project2", "Project3"), class = "factor"),
Week = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), Number = c(46,
46.4, 105, 70, 84, 34.8, 83, 37.9)), .Names = c("Project",
"Week", "Number"), class = "data.frame", row.names = c(NA, -8L
))
我想计算每个项目每周的总和。
所以我使用聚合函数:
aggregate(Number ~ Project + Week, data = my.df, sum)
如您所见,第 3 周的 Project2 没有任何价值。
使用聚合函数只是将其留空。 我想要的是用0填充该行。
我试过了:
aggregate(Number ~ Project + Week, data = my.df, sum, na.action = 0)
和
aggregate(Number ~ Project + Week, data = my.df, sum, na.action = function(x) 0)
但是 none 有效。 有什么想法吗?
您可以使用 xtabs()
:
my.df <- read.table(header=TRUE, text=
'Project Week Number
Project1 01 46.0
Project2 01 46.4
Project3 01 105.0
Project1 02 70.0
Project2 02 84.0
Project3 02 34.8
Project1 03 83.0
Project3 03 37.9')
my.df$Week <- paste0("0", my.df$Week)
xtabs(Number ~ Project+Week, data=my.df)
# Week
# Project 01 02 03
# Project1 46.0 70.0 83.0
# Project2 46.4 84.0 0.0
# Project3 105.0 34.8 37.9
as.data.frame(xtabs(Number ~ Project+Week, data=my.df))
# Project Week Freq
# 1 Project1 01 46.0
# 2 Project2 01 46.4
# 3 Project3 01 105.0
# 4 Project1 02 70.0
# 5 Project2 02 84.0
# 6 Project3 02 34.8
# 7 Project1 03 83.0
# 8 Project2 03 0.0
# 9 Project3 03 37.9
我们也可以使用tidyr
包中的complete
函数,将Project2
的值填入Week 3
。之后,我们就可以聚合数据了。
library(tidyr)
my.df2 <- my.df %>%
complete(Project, Week, fill = list(Number = 0))
my.df2
# # A tibble: 9 x 3
# Project Week Number
# <chr> <chr> <dbl>
# 1 Project1 01 46.0
# 2 Project1 02 70.0
# 3 Project1 03 83.0
# 4 Project2 01 46.4
# 5 Project2 02 84.0
# 6 Project2 03 0.0
# 7 Project3 01 105.0
# 8 Project3 02 34.8
# 9 Project3 03 37.9
数据
my.df <- read.table(text = "Project Week Number
Project1 '01' 46.0
Project2 01 46.4
Project3 01 105.0
Project1 02 70.0
Project2 02 84.0
Project3 02 34.8
Project1 03 83.0
Project3 03 37.9",
header = TRUE, stringsAsFactors = FALSE)
my.df$Week <- paste0("0", my.df$Week)
或者您可以使用 tidyr
中的 spread
和 fill = 0
aggregate(Number ~ Project + Week, data = my.df, sum) %>%
spread(key = Week,value = Number,fill = 0)
然后使用 gather 将其恢复为原始形式
aggregate(Number ~ Project + Week, data = my.df, sum) %>%
spread(key = Week,value = Number,fill = 0) %>%
gather(key = Week, value = Number,`1`,`2`,`3`)
您可以在 base R 中执行此操作,它几乎是 tidyr::complete
的代码在 base R 中的翻译(请参阅@www 的回答)。
df <- merge(
setNames(expand.grid(unique(df$Project),unique(df$Week)),c("Project","Week")),
df, all.x=TRUE)
df$Number[is.na(df$Number)] <- 0