使用 summarize 添加变量，但将所有其他变量保留在 R 中

Question

我有一个数据集，其中包含对不同政客的捐款，其中每一行都是特定的捐款。

donor.sector <- c(sector A, sector B, sector X, sector A, sector B)
total <- c(100, 100, 150, 125, 500)
year <- c(2006, 2006, 2007, 2007, 2007)
state <- c(CA, CA, CA, NY, WA)
target_specific <- c(politician A, politician A, politician A, politician B, politician C)
dat <- as.data.frame(donor.sector, total, year, target_specific, state)

我正在尝试获取每位政客一年的平均捐款数额。我可以通过执行以下操作来做到这一点：

library(dplyr)
  new.df <- dat%>%
  group_by(target_specific, year)%>%
  summarise(mean= mean(total))

我的问题是，由于我将其分组，结果只有三个变量：均值、年份和具体目标。有没有一种方法可以做到这一点并创建一个新的数据框，我可以在其中保留政治家级别的变量，例如州？

非常感谢！

Answer 1

有两种方法可以做到这一点：

在 group_by 中包含其他变量：

library(dplyr)

dat%>%
   group_by(target_specific, year, state)%>%
   summarise(mean= mean(total))

#  target_specific  year state  mean
#  <chr>           <dbl> <chr> <dbl>
#1 politician A     2006 CA      100
#2 politician A     2007 CA      150
#3 politician B     2007 NY      125
#4 politician C     2007 WA      500

或者保持相同的 group_by 结构，您可以包含附加变量的 first 值。

dat%>%
  group_by(target_specific, year)%>%
  summarise(mean= mean(total), state = first(state))

Answer 2

在base R中，我们可以使用aggregate

aggregate(total ~ ., subset(data, select = -donor.sector), mean)

使用 summarize 添加变量，但将所有其他变量保留在 R 中

Add variable with summarise but keep all other variables in R

merge

r

dplyr

summarize