使用 group by 的空间聚合
Spatial aggregation with a group by
我正在尝试计算基于空间分组的平均值
聚合。
我有两个 shapefile:人口普查区和病房。病房有其价值
我想对每个人口普查区按一个因子进行平均。
这是形状文件:
library(dplyr)
library(rgeos)
library(rgdal)
# Census tracts
download.file("http://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/files-fichiers/gct_000b11a_e.zip",
destfile = "gct_000a11a_e.zip")
unzip("gct_000a11a_e.zip", exdir="tracts") # corrected typo
census_tracts <- readOGR(dsn = "tracts", layer = "gct_000b11a_e") %>%
spTransform(CRS('+init=epsg:4326'))
# Wards
download.file("http://opendata.toronto.ca/gcc/voting_subdivision_2010_wgs84.zip",
destfile = "subdivisions_2010.zip")
unzip("subdivisions_2010.zip", exdir="wards")
wards <- readOGR(dsn = "wards", layer = "VOTING_SUBDIVISION_2010_WGS84") %>%
spTransform(proj4string(census_tracts))
然后我将人口普查区域子集化为病房中的区域:
census_tracts_in_wards <- census_tracts[wards, ]
我有每个病房的数据,具有两级因子:
df <- expand.grid(AREA_ID = wards$AREA_ID, factor = as.factor(letters[1:2]))
df$value <- rnorm(n = nrow(df))
wards@data <- left_join(wards@data, df)
现在(终于开始我的问题了)我想计算均值
每个人口普查区的价值,作为每个人口普查区内病房的集合
人口普查区。我想这就是我计算每次人口普查平均值的方法
道:
ag <- aggregate(x = wards["value"], by = census_tracts_in_wards, FUN = mean)
factor
有办法做到这一点吗?我想要 ag
空间
数据框包含 factor
列和平均值 value
的列
每个人口普查区。本质上相当于:
result <- df %>%
group_by(AREA_ID, factor) %>%
summarize(value = mean(value))
但是,从 census_tracts_in_wards
中按 CTUID
分组而不是
AREA_ID
在 wards
.
正如 Pierre Lafortune 所建议的,这里的公式语法似乎很自然。但是,none 这些工作:
ag2 <- aggregate(x = wards["value"] ~ wards["factor"],
by = census_tracts_in_wards, FUN = mean)
ag3 <- aggregate(x = wards["value" ~ "factor"],
by = census_tracts_in_wards, FUN = mean)
ag4 <- aggregate(x = wards["value ~ factor"],
by = census_tracts_in_wards, FUN = mean)
也许分组属于 FUN 通话?
在 Edzer Pebesma 的提示下,仔细阅读 sp::aggregate
文档表明 FUN 应用于 x 的每个属性。因此,与其创建带有因子列的长 table,不如创建两个单独的列(每个因子一个)似乎可行。
wards2 <- readOGR(dsn = "wards", layer = "VOTING_SUBDIVISION_2010_WGS84") %>%
spTransform(proj4string(census_tracts))
wards2@data <- dplyr::select(wards2@data, AREA_ID) # Drop the other attributes
df2 <- tidyr::spread(df, factor, value)
wards2@data <- left_join(wards2@data, df2)
ag5 <- aggregate(x = wards2, by = census_tracts_in_wards, FUN = mean)
ag5@data <- dplyr::select(ag5@data, -(AREA_ID)) # The mean of AREA_ID is meaningless
summary(ag5)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x -79.73389 -79.08603
## y 43.56243 43.89091
## Is projected: FALSE
## proj4string :
## [+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84
## +towgs84=0,0,0]
## Data attributes:
## a b
## Min. :-1.28815 Min. :-1.835409
## 1st Qu.:-0.24883 1st Qu.:-0.289510
## Median : 0.01048 Median : 0.008777
## Mean : 0.02666 Mean :-0.011018
## 3rd Qu.: 0.25450 3rd Qu.: 0.265358
## Max. : 1.92769 Max. : 1.399876
我正在尝试计算基于空间分组的平均值 聚合。
我有两个 shapefile:人口普查区和病房。病房有其价值 我想对每个人口普查区按一个因子进行平均。
这是形状文件:
library(dplyr)
library(rgeos)
library(rgdal)
# Census tracts
download.file("http://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/files-fichiers/gct_000b11a_e.zip",
destfile = "gct_000a11a_e.zip")
unzip("gct_000a11a_e.zip", exdir="tracts") # corrected typo
census_tracts <- readOGR(dsn = "tracts", layer = "gct_000b11a_e") %>%
spTransform(CRS('+init=epsg:4326'))
# Wards
download.file("http://opendata.toronto.ca/gcc/voting_subdivision_2010_wgs84.zip",
destfile = "subdivisions_2010.zip")
unzip("subdivisions_2010.zip", exdir="wards")
wards <- readOGR(dsn = "wards", layer = "VOTING_SUBDIVISION_2010_WGS84") %>%
spTransform(proj4string(census_tracts))
然后我将人口普查区域子集化为病房中的区域:
census_tracts_in_wards <- census_tracts[wards, ]
我有每个病房的数据,具有两级因子:
df <- expand.grid(AREA_ID = wards$AREA_ID, factor = as.factor(letters[1:2]))
df$value <- rnorm(n = nrow(df))
wards@data <- left_join(wards@data, df)
现在(终于开始我的问题了)我想计算均值 每个人口普查区的价值,作为每个人口普查区内病房的集合 人口普查区。我想这就是我计算每次人口普查平均值的方法 道:
ag <- aggregate(x = wards["value"], by = census_tracts_in_wards, FUN = mean)
factor
有办法做到这一点吗?我想要 ag
空间
数据框包含 factor
列和平均值 value
的列
每个人口普查区。本质上相当于:
result <- df %>%
group_by(AREA_ID, factor) %>%
summarize(value = mean(value))
但是,从 census_tracts_in_wards
中按 CTUID
分组而不是
AREA_ID
在 wards
.
正如 Pierre Lafortune 所建议的,这里的公式语法似乎很自然。但是,none 这些工作:
ag2 <- aggregate(x = wards["value"] ~ wards["factor"],
by = census_tracts_in_wards, FUN = mean)
ag3 <- aggregate(x = wards["value" ~ "factor"],
by = census_tracts_in_wards, FUN = mean)
ag4 <- aggregate(x = wards["value ~ factor"],
by = census_tracts_in_wards, FUN = mean)
也许分组属于 FUN 通话?
在 Edzer Pebesma 的提示下,仔细阅读 sp::aggregate
文档表明 FUN 应用于 x 的每个属性。因此,与其创建带有因子列的长 table,不如创建两个单独的列(每个因子一个)似乎可行。
wards2 <- readOGR(dsn = "wards", layer = "VOTING_SUBDIVISION_2010_WGS84") %>%
spTransform(proj4string(census_tracts))
wards2@data <- dplyr::select(wards2@data, AREA_ID) # Drop the other attributes
df2 <- tidyr::spread(df, factor, value)
wards2@data <- left_join(wards2@data, df2)
ag5 <- aggregate(x = wards2, by = census_tracts_in_wards, FUN = mean)
ag5@data <- dplyr::select(ag5@data, -(AREA_ID)) # The mean of AREA_ID is meaningless
summary(ag5)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x -79.73389 -79.08603
## y 43.56243 43.89091
## Is projected: FALSE
## proj4string :
## [+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84
## +towgs84=0,0,0]
## Data attributes:
## a b
## Min. :-1.28815 Min. :-1.835409
## 1st Qu.:-0.24883 1st Qu.:-0.289510
## Median : 0.01048 Median : 0.008777
## Mean : 0.02666 Mean :-0.011018
## 3rd Qu.: 0.25450 3rd Qu.: 0.265358
## Max. : 1.92769 Max. : 1.399876