使用 rbind 聚合多个变量
Aggregate multiple variables with rbind
我想汇总几个变量 (countys) 并分配一个新的行名 (Florida)。我已经用 rbind 试过了。它适用于一个变量,但不适用于多个变量。
year <- c(2005,2006,2007,2005,2006,2007,2005,2006,2007)
county <- c("Alachua County","Alachua County","Alachua County","Baker County","Baker County","Baker County","Bay County","Bay County","Bay County")
value1 <- c(3,6,8,9,8,4,5,8,10)
value2 <- c(3,6,8,9,8,4,5,8,10)
value3 <- c(3,6,8,9,8,4,5,8,10)
value4<-c(3,6,8,9,8,4,5,8,10)
df <- data.frame(year, county,value1,value2,value3,value4, stringsAsFactors = FALSE)
结果应如下所示:
year
county
value1
value2
value3
value4
2005
Alachua County
3
3
3
3
2006
Alachua County
6
6
6
6
2007
Alachua County
8
8
8
8
2005
Baker County
9
9
9
9
2006
Baker County
8
8
8
8
2007
Baker County
4
4
4
4
2005
Bay County
5
5
5
5
2006
Bay County
8
8
8
8
2007
Bay County
10
10
10
10
2005
Florida
17
17
17
17
2006
Florida
22
22
22
22
2007
Florida
22
22
22
22
我试过这个:
df<-df %>%
group_by(year, county)
df<-rbind(df, aggregate (value1,value2,value3,value4) ~ year, df, FUN = sum)
并得到以下错误:
Argument 2 must be a data frame or a named atomic vector
.
这是一个data.table
方法,使用janitor
计算年度总计
# make df a data.table
setDT(df)
# split by year
L <- split(df, by = "year", keep.by = FALSE)
# calculate totals by year
L <- lapply(L, janitor::adorn_totals, name = "Florida")
# rowbind L together to a single data.table
rbindlist(L, use.names = TRUE, id = "year")
# year county value1 value2 value3 value4
# 1: 2005 Alachua County 3 3 3 3
# 2: 2005 Baker County 9 9 9 9
# 3: 2005 Bay County 5 5 5 5
# 4: 2005 Florida 17 17 17 17
# 5: 2006 Alachua County 6 6 6 6
# 6: 2006 Baker County 8 8 8 8
# 7: 2006 Bay County 8 8 8 8
# 8: 2006 Florida 22 22 22 22
# 9: 2007 Alachua County 8 8 8 8
#10: 2007 Baker County 4 4 4 4
#11: 2007 Bay County 10 10 10 10
#12: 2007 Florida 22 22 22 22
dplyr
方法是:
df %>%
bind_rows(df %>%
group_by(year) %>%
summarize(county = 'Florida', across(starts_with('value'), sum))) %>%
arrange(year, county)
#> year county value1 value2 value3 value4
#> 1 2005 Alachua County 3 3 3 3
#> 2 2005 Baker County 9 9 9 9
#> 3 2005 Bay County 5 5 5 5
#> 4 2005 Florida 17 17 17 17
#> 5 2006 Alachua County 6 6 6 6
#> 6 2006 Baker County 8 8 8 8
#> 7 2006 Bay County 8 8 8 8
#> 8 2006 Florida 22 22 22 22
#> 9 2007 Alachua County 8 8 8 8
#> 10 2007 Baker County 4 4 4 4
#> 11 2007 Bay County 10 10 10 10
#> 12 2007 Florida 22 22 22 22
我想汇总几个变量 (countys) 并分配一个新的行名 (Florida)。我已经用 rbind 试过了。它适用于一个变量,但不适用于多个变量。
year <- c(2005,2006,2007,2005,2006,2007,2005,2006,2007)
county <- c("Alachua County","Alachua County","Alachua County","Baker County","Baker County","Baker County","Bay County","Bay County","Bay County")
value1 <- c(3,6,8,9,8,4,5,8,10)
value2 <- c(3,6,8,9,8,4,5,8,10)
value3 <- c(3,6,8,9,8,4,5,8,10)
value4<-c(3,6,8,9,8,4,5,8,10)
df <- data.frame(year, county,value1,value2,value3,value4, stringsAsFactors = FALSE)
结果应如下所示:
year | county | value1 | value2 | value3 | value4 |
---|---|---|---|---|---|
2005 | Alachua County | 3 | 3 | 3 | 3 |
2006 | Alachua County | 6 | 6 | 6 | 6 |
2007 | Alachua County | 8 | 8 | 8 | 8 |
2005 | Baker County | 9 | 9 | 9 | 9 |
2006 | Baker County | 8 | 8 | 8 | 8 |
2007 | Baker County | 4 | 4 | 4 | 4 |
2005 | Bay County | 5 | 5 | 5 | 5 |
2006 | Bay County | 8 | 8 | 8 | 8 |
2007 | Bay County | 10 | 10 | 10 | 10 |
2005 | Florida | 17 | 17 | 17 | 17 |
2006 | Florida | 22 | 22 | 22 | 22 |
2007 | Florida | 22 | 22 | 22 | 22 |
我试过这个:
df<-df %>%
group_by(year, county)
df<-rbind(df, aggregate (value1,value2,value3,value4) ~ year, df, FUN = sum)
并得到以下错误:
Argument 2 must be a data frame or a named atomic vector
.
这是一个data.table
方法,使用janitor
计算年度总计
# make df a data.table
setDT(df)
# split by year
L <- split(df, by = "year", keep.by = FALSE)
# calculate totals by year
L <- lapply(L, janitor::adorn_totals, name = "Florida")
# rowbind L together to a single data.table
rbindlist(L, use.names = TRUE, id = "year")
# year county value1 value2 value3 value4
# 1: 2005 Alachua County 3 3 3 3
# 2: 2005 Baker County 9 9 9 9
# 3: 2005 Bay County 5 5 5 5
# 4: 2005 Florida 17 17 17 17
# 5: 2006 Alachua County 6 6 6 6
# 6: 2006 Baker County 8 8 8 8
# 7: 2006 Bay County 8 8 8 8
# 8: 2006 Florida 22 22 22 22
# 9: 2007 Alachua County 8 8 8 8
#10: 2007 Baker County 4 4 4 4
#11: 2007 Bay County 10 10 10 10
#12: 2007 Florida 22 22 22 22
dplyr
方法是:
df %>%
bind_rows(df %>%
group_by(year) %>%
summarize(county = 'Florida', across(starts_with('value'), sum))) %>%
arrange(year, county)
#> year county value1 value2 value3 value4
#> 1 2005 Alachua County 3 3 3 3
#> 2 2005 Baker County 9 9 9 9
#> 3 2005 Bay County 5 5 5 5
#> 4 2005 Florida 17 17 17 17
#> 5 2006 Alachua County 6 6 6 6
#> 6 2006 Baker County 8 8 8 8
#> 7 2006 Bay County 8 8 8 8
#> 8 2006 Florida 22 22 22 22
#> 9 2007 Alachua County 8 8 8 8
#> 10 2007 Baker County 4 4 4 4
#> 11 2007 Bay County 10 10 10 10
#> 12 2007 Florida 22 22 22 22