使用聚合函数以特定方式处理 NA

Question

我有一个如下所示的数据框：

Project Week Number
Project1   01  46.0
Project2   01  46.4
Project3   01 105.0
Project1   02  70.0
Project2   02  84.0
Project3   02  34.8
Project1   03  83.0
Project3   03  37.9

编辑：

> dput(my.df)
structure(list(Project = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 
1L, 3L), .Label = c("Project1", "Project2", "Project3"), class = "factor"), 
    Week = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), Number = c(46, 
    46.4, 105, 70, 84, 34.8, 83, 37.9)), .Names = c("Project", 
"Week", "Number"), class = "data.frame", row.names = c(NA, -8L
))

我想计算每个项目每周的总和。

所以我使用聚合函数：

aggregate(Number ~ Project + Week, data = my.df, sum)

如您所见，第 3 周的 Project2 没有任何价值。

使用聚合函数只是将其留空。我想要的是用0填充该行。

我试过了：

aggregate(Number ~ Project + Week, data = my.df, sum, na.action = 0)

和

aggregate(Number ~ Project + Week, data = my.df, sum, na.action = function(x) 0)

但是 none 有效。有什么想法吗？

Answer 1

您可以使用 xtabs():

my.df <- read.table(header=TRUE, text=
'Project Week Number
Project1   01  46.0
Project2   01  46.4
Project3   01 105.0
Project1   02  70.0
Project2   02  84.0
Project3   02  34.8
Project1   03  83.0
Project3   03  37.9')
my.df$Week <- paste0("0", my.df$Week)

xtabs(Number ~ Project+Week, data=my.df)
#           Week
# Project       01    02    03
#   Project1  46.0  70.0  83.0
#   Project2  46.4  84.0   0.0
#   Project3 105.0  34.8  37.9
as.data.frame(xtabs(Number ~ Project+Week, data=my.df))
#    Project Week  Freq
# 1 Project1   01  46.0
# 2 Project2   01  46.4
# 3 Project3   01 105.0
# 4 Project1   02  70.0
# 5 Project2   02  84.0
# 6 Project3   02  34.8
# 7 Project1   03  83.0
# 8 Project2   03   0.0
# 9 Project3   03  37.9

Answer 2

我们也可以使用tidyr包中的complete函数，将Project2的值填入Week 3。之后，我们就可以聚合数据了。

library(tidyr)

my.df2 <- my.df %>% 
  complete(Project, Week, fill = list(Number = 0))

my.df2

# # A tibble: 9 x 3
#    Project  Week Number
#      <chr> <chr>  <dbl>
# 1 Project1    01   46.0
# 2 Project1    02   70.0
# 3 Project1    03   83.0
# 4 Project2    01   46.4
# 5 Project2    02   84.0
# 6 Project2    03    0.0
# 7 Project3    01  105.0
# 8 Project3    02   34.8
# 9 Project3    03   37.9

数据

my.df <- read.table(text = "Project Week Number
Project1   '01'  46.0
                 Project2   01  46.4
                 Project3   01 105.0
                 Project1   02  70.0
                 Project2   02  84.0
                 Project3   02  34.8
                 Project1   03  83.0
                 Project3   03  37.9",
                 header = TRUE, stringsAsFactors = FALSE)

my.df$Week <- paste0("0", my.df$Week)

Answer 3

或者您可以使用 tidyr 中的 spread 和 fill = 0

aggregate(Number ~ Project + Week, data = my.df, sum) %>% 
  spread(key = Week,value = Number,fill = 0)

然后使用 gather 将其恢复为原始形式

aggregate(Number ~ Project + Week, data = my.df, sum) %>% 
  spread(key = Week,value = Number,fill = 0) %>% 
  gather(key = Week, value = Number,`1`,`2`,`3`)

Answer 4

您可以在 base R 中执行此操作，它几乎是 tidyr::complete 的代码在 base R 中的翻译（请参阅@www 的回答）。

df <- merge(
  setNames(expand.grid(unique(df$Project),unique(df$Week)),c("Project","Week")),
  df, all.x=TRUE)
df$Number[is.na(df$Number)] <- 0

使用聚合函数以特定方式处理 NA

Process NA in a specific way with aggregate function

r

dataframe

na