考虑到 R 中的纬度和经度值,如何计算两个不同变量的平均值?
How do I calculate the mean of two different variables taking into account the values of latitude and longitude in R?
我目前正在尝试从 table 中获取一些 R 中的数据。
我有一个包含两个不同变量的数据集,即全球海面温度 (SST) 的年范围和年平均值。每个纬度(从 90 到 -90)和经度(从 180 到 -180)级别都有这些值。
我想获得 latitude/longitude 的 5x5 网格单元的上述变量的平均值(年度范围和年度平均值)。例如,我需要知道 -180 到 -176 之间的经度和 90 到 86 之间的纬度的 "annual range" 均值,依此类推,直到获得所有可能的 5x5 网格单元的该变量的均值.
我的数据如下:
lon lat ANNUAL_MEAN ANNUAL_RANGE
1 0.5 89.5 -1.8 0
2 1.5 89.5 -1.8 0
3 2.5 89.5 -1.8 0
4 3.5 89.5 -1.8 0
5 4.5 89.5 -1.8 0
6 5.5 89.5 -1.8 0
...
52001 354.5 -89.5 -1.8 0
52002 355.5 -89.5 -1.8 0
52003 356.5 -89.5 -1.8 0
52004 357.5 -89.5 -1.8 0
52005 358.5 -89.5 -1.8 0
52006 359.5 -89.5 -1.8 0
提前致谢
您可以使用 raster
包及其 focal
函数进行移动计算 window。
首先,我将创建一个虚拟 data.frame 来代表您的数据
# Prepare dummy data.frame
set.seed(2222)
lonlat <- expand.grid(1:10, 1:10)
df <- data.frame( lon = lonlat[, 1],
lat = lonlat[, 2],
ANNUAL_MEAN = rnorm(100),
ANNUAL_RANGE = runif(100, 1, 5)
)
现在我们必须将数据帧转换为栅格并执行移动 window 平均。
library(raster)
# Convert data frame to raster object
rdf <- df
coordinates(rdf) <- ~ lon + lat
gridded(rdf) <- TRUE
rdf <- brick(rdf) # our raster brick
## Perform moving window averaging
# prepare weights matrix (5*5)
w <- matrix(1, ncol = 5, nrow = 5)
# perform moving window averaging
ANNUAL_MEAN_AVG <- focal(rdf[[1]], w, mean, pad = TRUE, na.rm = TRUE)
ANNUAL_RANGE_AVG <- focal(rdf[[2]], w, mean, pad = TRUE, na.rm = TRUE)
# Append new data to initial data.frame
df$ANNUAL_MEAN_AVG <- as.data.frame(ANNUAL_MEAN_AVG)
df$ANNUAL_RANGE_AVG <- as.data.frame(ANNUAL_RANGE_AVG)
现在 df$ANNUAL_MEAN_AVG
和 df$ANNUAL_RANGE_AVG
中的每个单元格都包含相应 5*5 正方形的平均值。
UPD 1. 5x5 下采样
如果您需要一个固定的 5x5 网格单元格,每个单元格都有平均值,您可以使用 raster::agregate
函数。
使用上一示例中的 rdf
光栅砖块。
# perform an aggregation with given downsampling factor
rdf_d <- aggregate(rdf, fact=5, fun = mean)
# Now each pixel in the raster `rdf_d` contains a mean value of 5x5 pixels from initial `rdf`
# we need to get pixels coordinates and their values
coord <- coordinates(rdf_d)
vals <- as.data.frame(rdf_d)
colnames(coord) <- c("lon", "lat")
colnames(vals) <- c("ANNUAL_MEAN_AVG", "ANNUAL_RANGE_AVG")
res <- cbind(coord, vals)
这是一个使用 dplyr 包的解决方案,包含在 tidyverse 中。它应该很容易遵循,一步一步。
library(tidyverse)
# set.seed() assures reproducability of the example with identical random numbers
set.seed(42)
# build a simulated data set as described in the question
lats <- seq(from = -90, to = 90, by = 0.5)
lons <- seq(from = -180, to = 179.5, by = 0.5) # we must omit +180 or we would
# double count those points
# since they coincide with -180
# combining each latitude point with each longitude point
coord <- merge(lats, lons) %>%
rename(lat = x) %>%
rename(lon = y) %>%
# adding simulated values
mutate(annual_mean = runif(n = nrow(.), min = -2, max = 2)) %>%
mutate(annual_range = runif(n = nrow(.), min = 0, max = 3)) %>%
# defining bands of 5 latitude and 5 longitude points by using integer division
mutate(lat_band = lat%/%5) %>%
mutate(lon_band = lon%/%5) %>%
# creating a name label for each unique 5x5 gridcell
mutate(gridcell_5x5 = paste(lat_band, lon_band, sep = ",")) %>%
# group-by instruction, much like in SQL
group_by(lat_band, lon_band, gridcell_5x5) %>%
# sorting to get a nice order
arrange(lat_band, lon_band) %>%
# calculating minimum and maximum latitude and longitude for each gridcell
# calculating the mean values per gridcell
summarize(gridcell_min_lat = min(lat),
gridcell_max_lat = max(lat),
gridcell_min_lon = min(lon),
gridcell_max_lon = max(lon),
gridcell_mean_annual_mean = round(mean(annual_mean), 3),
gridcell_mean_annual_range = round(mean(annual_range), 3) )
我目前正在尝试从 table 中获取一些 R 中的数据。
我有一个包含两个不同变量的数据集,即全球海面温度 (SST) 的年范围和年平均值。每个纬度(从 90 到 -90)和经度(从 180 到 -180)级别都有这些值。
我想获得 latitude/longitude 的 5x5 网格单元的上述变量的平均值(年度范围和年度平均值)。例如,我需要知道 -180 到 -176 之间的经度和 90 到 86 之间的纬度的 "annual range" 均值,依此类推,直到获得所有可能的 5x5 网格单元的该变量的均值.
我的数据如下:
lon lat ANNUAL_MEAN ANNUAL_RANGE
1 0.5 89.5 -1.8 0
2 1.5 89.5 -1.8 0
3 2.5 89.5 -1.8 0
4 3.5 89.5 -1.8 0
5 4.5 89.5 -1.8 0
6 5.5 89.5 -1.8 0
...
52001 354.5 -89.5 -1.8 0
52002 355.5 -89.5 -1.8 0
52003 356.5 -89.5 -1.8 0
52004 357.5 -89.5 -1.8 0
52005 358.5 -89.5 -1.8 0
52006 359.5 -89.5 -1.8 0
提前致谢
您可以使用 raster
包及其 focal
函数进行移动计算 window。
首先,我将创建一个虚拟 data.frame 来代表您的数据
# Prepare dummy data.frame
set.seed(2222)
lonlat <- expand.grid(1:10, 1:10)
df <- data.frame( lon = lonlat[, 1],
lat = lonlat[, 2],
ANNUAL_MEAN = rnorm(100),
ANNUAL_RANGE = runif(100, 1, 5)
)
现在我们必须将数据帧转换为栅格并执行移动 window 平均。
library(raster)
# Convert data frame to raster object
rdf <- df
coordinates(rdf) <- ~ lon + lat
gridded(rdf) <- TRUE
rdf <- brick(rdf) # our raster brick
## Perform moving window averaging
# prepare weights matrix (5*5)
w <- matrix(1, ncol = 5, nrow = 5)
# perform moving window averaging
ANNUAL_MEAN_AVG <- focal(rdf[[1]], w, mean, pad = TRUE, na.rm = TRUE)
ANNUAL_RANGE_AVG <- focal(rdf[[2]], w, mean, pad = TRUE, na.rm = TRUE)
# Append new data to initial data.frame
df$ANNUAL_MEAN_AVG <- as.data.frame(ANNUAL_MEAN_AVG)
df$ANNUAL_RANGE_AVG <- as.data.frame(ANNUAL_RANGE_AVG)
现在 df$ANNUAL_MEAN_AVG
和 df$ANNUAL_RANGE_AVG
中的每个单元格都包含相应 5*5 正方形的平均值。
UPD 1. 5x5 下采样
如果您需要一个固定的 5x5 网格单元格,每个单元格都有平均值,您可以使用 raster::agregate
函数。
使用上一示例中的 rdf
光栅砖块。
# perform an aggregation with given downsampling factor
rdf_d <- aggregate(rdf, fact=5, fun = mean)
# Now each pixel in the raster `rdf_d` contains a mean value of 5x5 pixels from initial `rdf`
# we need to get pixels coordinates and their values
coord <- coordinates(rdf_d)
vals <- as.data.frame(rdf_d)
colnames(coord) <- c("lon", "lat")
colnames(vals) <- c("ANNUAL_MEAN_AVG", "ANNUAL_RANGE_AVG")
res <- cbind(coord, vals)
这是一个使用 dplyr 包的解决方案,包含在 tidyverse 中。它应该很容易遵循,一步一步。
library(tidyverse)
# set.seed() assures reproducability of the example with identical random numbers
set.seed(42)
# build a simulated data set as described in the question
lats <- seq(from = -90, to = 90, by = 0.5)
lons <- seq(from = -180, to = 179.5, by = 0.5) # we must omit +180 or we would
# double count those points
# since they coincide with -180
# combining each latitude point with each longitude point
coord <- merge(lats, lons) %>%
rename(lat = x) %>%
rename(lon = y) %>%
# adding simulated values
mutate(annual_mean = runif(n = nrow(.), min = -2, max = 2)) %>%
mutate(annual_range = runif(n = nrow(.), min = 0, max = 3)) %>%
# defining bands of 5 latitude and 5 longitude points by using integer division
mutate(lat_band = lat%/%5) %>%
mutate(lon_band = lon%/%5) %>%
# creating a name label for each unique 5x5 gridcell
mutate(gridcell_5x5 = paste(lat_band, lon_band, sep = ",")) %>%
# group-by instruction, much like in SQL
group_by(lat_band, lon_band, gridcell_5x5) %>%
# sorting to get a nice order
arrange(lat_band, lon_band) %>%
# calculating minimum and maximum latitude and longitude for each gridcell
# calculating the mean values per gridcell
summarize(gridcell_min_lat = min(lat),
gridcell_max_lat = max(lat),
gridcell_min_lon = min(lon),
gridcell_max_lon = max(lon),
gridcell_mean_annual_mean = round(mean(annual_mean), 3),
gridcell_mean_annual_range = round(mean(annual_range), 3) )