表示多个组的多个列
Means multiple columns by multiple groups
我正在尝试为具有多个组的数据框的多个列找到方法,不包括 NA
airquality <- data.frame(City = c("CityA", "CityA","CityA",
"CityB","CityB","CityB",
"CityC", "CityC"),
year = c("1990", "2000", "2010", "1990",
"2000", "2010", "2000", "2010"),
month = c("June", "July", "August",
"June", "July", "August",
"June", "August"),
PM10 = c(runif(3), rnorm(5)),
PM25 = c(runif(3), rnorm(5)),
Ozone = c(runif(3), rnorm(5)),
CO2 = c(runif(3), rnorm(5)))
airquality
所以我得到一个带有编号的名称列表,这样我就知道哪些列要 select:
nam<-names(airquality)
namelist <- data.frame(matrix(t(nam)));namelist
我想按城市和年份计算 PM25、臭氧和二氧化碳的平均值。这意味着我需要列 1,2,4,6:7)
acast(datadf, year ~ city, mean, na.rm=TRUE)
但这并不是我真正想要的,因为它包含了我不需要的东西的平均值,而且它不是数据框格式。我可以转换它然后删除,但这似乎是一种非常低效的方法。
有没有更好的方法?
你应该试试 dplyr::mutate_at
:
library(dplyr)
airquality %>%
group_by(City, year) %>%
summarise_at(.vars = c("PM10", "PM25", "Ozone", "CO2"), .funs = mean)
# A tibble: 8 x 6
# Groups: City [?]
City year PM10 PM25 Ozone CO2
<fctr> <fctr> <dbl> <dbl> <dbl> <dbl>
1 CityA 1990 0.004087379 0.5146409 0.44393422 0.61196671
2 CityA 2000 0.039414194 0.8865582 0.06754322 0.69870187
3 CityA 2010 0.116901563 0.6608619 0.51499227 0.32952099
4 CityB 1990 -1.535888778 -0.9601897 1.17183649 0.08380664
5 CityB 2000 0.226046487 0.4037230 0.86554997 -0.05698204
6 CityB 2010 -0.824719956 0.1508471 0.32089806 -0.12871853
7 CityC 2000 -0.824509111 -0.6928741 0.85553837 0.12137923
8 CityC 2010 -1.626150294 1.5176198 0.21183149 -0.63859910
我们可以使用 dplyr
和 summarise_at
在按感兴趣的列分组后得到 mean
相关列
library(dplyr)
airquality %>%
group_by(City, year) %>%
summarise_at(vars("PM25", "Ozone", "CO2"), mean)
或使用 dplyr
的 devel
版本(版本 - ‘0.8.99.9000’
)
airquality %>%
group_by(City, year) %>%
summarise(across(PM25:CO2, mean))
Colin 的 summarise_at
解决方案是最简单的,但当然有几个。
这是另一种解决方案,使用 tidyr
重新排列并计算平均值:
airquality %>%
select(City, year, PM25, Ozone, CO2) %>%
gather(var, value, -City, -year) %>%
group_by(City, year, var) %>%
summarise(avg = mean(value, na.rm=T)) %>% # can stop here if you want
spread(var, avg) # optional to make this into a wider table
# A tibble: 8 x 5
# Groups: City, year [8]
City year CO2 Ozone PM25
* <fctr> <fctr> <dbl> <dbl> <dbl>
1 CityA 1990 0.275981522 0.19941717 0.826008441
2 CityA 2000 0.090342153 0.50949094 0.005052771
3 CityA 2010 0.007345704 0.21893117 0.625373926
4 CityB 1990 1.148717447 -1.05983482 -0.961916973
5 CityB 2000 -2.334429324 0.28301220 -0.828515418
6 CityB 2010 1.110398814 -0.56434523 -0.804353609
7 CityC 2000 -0.676236740 0.20661529 -0.696816058
8 CityC 2010 0.229428142 0.06202997 -1.396357288
所以我测试了上面的评论并向原始数据集添加了更多复制,因为我想按城市和年份计算平均值。这是更新后的数据集
airquality <- data.frame(City = c("CityA", "CityA","CityA","CityA",
"CityB","CityB","CityB","CityB",
"CityC", "CityC", "CityC"),
year = c("1990", "2000", "2010", "2010",
"1990", "2000", "2010", "2010",
"1990", "2000", "2000"),
month = c("June", "July", "August", "August",
"June", "July", "August","August",
"June", "August", "August"),
PM10 = c(runif(6), rnorm(5)),
PM25 = c(runif(6), rnorm(5)),
Ozone = c(runif(6), rnorm(5)),
CO2 = c(runif(6), rnorm(5)))
airquality
在上述答案中,AK 运行 和 Colin 工作。
我正在尝试为具有多个组的数据框的多个列找到方法,不包括 NA
airquality <- data.frame(City = c("CityA", "CityA","CityA",
"CityB","CityB","CityB",
"CityC", "CityC"),
year = c("1990", "2000", "2010", "1990",
"2000", "2010", "2000", "2010"),
month = c("June", "July", "August",
"June", "July", "August",
"June", "August"),
PM10 = c(runif(3), rnorm(5)),
PM25 = c(runif(3), rnorm(5)),
Ozone = c(runif(3), rnorm(5)),
CO2 = c(runif(3), rnorm(5)))
airquality
所以我得到一个带有编号的名称列表,这样我就知道哪些列要 select:
nam<-names(airquality)
namelist <- data.frame(matrix(t(nam)));namelist
我想按城市和年份计算 PM25、臭氧和二氧化碳的平均值。这意味着我需要列 1,2,4,6:7)
acast(datadf, year ~ city, mean, na.rm=TRUE)
但这并不是我真正想要的,因为它包含了我不需要的东西的平均值,而且它不是数据框格式。我可以转换它然后删除,但这似乎是一种非常低效的方法。
有没有更好的方法?
你应该试试 dplyr::mutate_at
:
library(dplyr)
airquality %>%
group_by(City, year) %>%
summarise_at(.vars = c("PM10", "PM25", "Ozone", "CO2"), .funs = mean)
# A tibble: 8 x 6
# Groups: City [?]
City year PM10 PM25 Ozone CO2
<fctr> <fctr> <dbl> <dbl> <dbl> <dbl>
1 CityA 1990 0.004087379 0.5146409 0.44393422 0.61196671
2 CityA 2000 0.039414194 0.8865582 0.06754322 0.69870187
3 CityA 2010 0.116901563 0.6608619 0.51499227 0.32952099
4 CityB 1990 -1.535888778 -0.9601897 1.17183649 0.08380664
5 CityB 2000 0.226046487 0.4037230 0.86554997 -0.05698204
6 CityB 2010 -0.824719956 0.1508471 0.32089806 -0.12871853
7 CityC 2000 -0.824509111 -0.6928741 0.85553837 0.12137923
8 CityC 2010 -1.626150294 1.5176198 0.21183149 -0.63859910
我们可以使用 dplyr
和 summarise_at
在按感兴趣的列分组后得到 mean
相关列
library(dplyr)
airquality %>%
group_by(City, year) %>%
summarise_at(vars("PM25", "Ozone", "CO2"), mean)
或使用 dplyr
的 devel
版本(版本 - ‘0.8.99.9000’
)
airquality %>%
group_by(City, year) %>%
summarise(across(PM25:CO2, mean))
Colin 的 summarise_at
解决方案是最简单的,但当然有几个。
这是另一种解决方案,使用 tidyr
重新排列并计算平均值:
airquality %>%
select(City, year, PM25, Ozone, CO2) %>%
gather(var, value, -City, -year) %>%
group_by(City, year, var) %>%
summarise(avg = mean(value, na.rm=T)) %>% # can stop here if you want
spread(var, avg) # optional to make this into a wider table
# A tibble: 8 x 5
# Groups: City, year [8]
City year CO2 Ozone PM25
* <fctr> <fctr> <dbl> <dbl> <dbl>
1 CityA 1990 0.275981522 0.19941717 0.826008441
2 CityA 2000 0.090342153 0.50949094 0.005052771
3 CityA 2010 0.007345704 0.21893117 0.625373926
4 CityB 1990 1.148717447 -1.05983482 -0.961916973
5 CityB 2000 -2.334429324 0.28301220 -0.828515418
6 CityB 2010 1.110398814 -0.56434523 -0.804353609
7 CityC 2000 -0.676236740 0.20661529 -0.696816058
8 CityC 2010 0.229428142 0.06202997 -1.396357288
所以我测试了上面的评论并向原始数据集添加了更多复制,因为我想按城市和年份计算平均值。这是更新后的数据集
airquality <- data.frame(City = c("CityA", "CityA","CityA","CityA",
"CityB","CityB","CityB","CityB",
"CityC", "CityC", "CityC"),
year = c("1990", "2000", "2010", "2010",
"1990", "2000", "2010", "2010",
"1990", "2000", "2000"),
month = c("June", "July", "August", "August",
"June", "July", "August","August",
"June", "August", "August"),
PM10 = c(runif(6), rnorm(5)),
PM25 = c(runif(6), rnorm(5)),
Ozone = c(runif(6), rnorm(5)),
CO2 = c(runif(6), rnorm(5)))
airquality
在上述答案中,AK 运行 和 Colin 工作。