sf & dplyr:按相同坐标分组的均值不起作用
sf & dplyr: Grouped mean by same coordinates doesn't work
我有一个从 S2 卫星栅格(10 x 10 米)中提取的数据集,其中包含 12 个值 (ras.df.ll
),但 6 个在图块中 (T21JYG
),第二个在另一个图块中(T21JYG
)。
我想计算瓷砖之间相同(x,y 坐标)的平均值但没有成功。我找不到任何方法来识别第一个图块中的第一行与第二个图块中的第一行的坐标相同,只是我的数据集的末尾。在我的例子中:
library(sf)
library(sfheaders)
library(dplyr)
# Raster (10x10 meters) info in data frame 1 - crs +proj=utm +zone=21 +south +datum=WGS84 +units=m +no_defs
x <-c(789385,789395,789405,789415,789425,789435)
y <-c(6626865,6626865,6626865,6626865,6626865,6626865)
tile <- rep("T21JYG",6)
values <-c(321,249,234,238,224,244)
ras.ds1<-data.frame(x,y,tile,values)
ras.ds1.sf <- st_as_sf(ras.ds1, coords = c("x", "y"), crs = 32721, agr = "constant")
ras.ds1.sf.ll <- st_transform(ras.ds1.sf, crs=4326)
ras.ds1.sf.ll
#Simple feature collection with 6 features and 2 fields
#Attribute-geometry relationship: 2 constant, 0 aggregate, 0 identity
#Geometry type: POINT
#Dimension: XY
#Bounding box: xmin: -53.98638 ymin: -30.45564 xmax: -53.98586 ymax: -30.45562
#Geodetic CRS: WGS 84
# tile values geometry
# 1 T21JYG 321 POINT (-53.98638 -30.45564)
# 2 T21JYG 249 POINT (-53.98628 -30.45563)
# 3 T21JYG 234 POINT (-53.98617 -30.45563)
# 4 T21JYG 238 POINT (-53.98607 -30.45563)
# 5 T21JYG 224 POINT (-53.98596 -30.45563)
# 6 T21JYG 244 POINT (-53.98586 -30.45562)
# Raster (10x10 meters) info in data frame - crs +proj=utm +zone=22 +south +datum=WGS84 +units=m +no_defs
x <-c(213285,213295,213305,213315,213325,213335)
y <-c(6626955,6626955,6626955,6626955,6626955,6626955)
tile <- rep("T22JBM",6)
values <-c(336,355,363,426,341,308)
ras.ds2 <-data.frame(x,y,tile,values)
ras.ds2.sf <- st_as_sf(ras.ds2, coords = c("x", "y"), crs = 32722, agr = "constant")
ras.ds2.sf.ll <- st_transform(ras.ds2.sf, crs=4326)
ras.ds2.sf.ll
# Simple feature collection with 6 features and 2 fields
# Attribute-geometry relationship: 2 constant, 0 aggregate, 0 identity
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: -53.98638 ymin: -30.45564 xmax: -53.98586 ymax: -30.45562
# Geodetic CRS: WGS 84
# tile values geometry
# 1 T21JYG 321 POINT (-53.98638 -30.45564)
# 2 T21JYG 249 POINT (-53.98628 -30.45563)
# 3 T21JYG 234 POINT (-53.98617 -30.45563)
# 4 T21JYG 238 POINT (-53.98607 -30.45563)
# 5 T21JYG 224 POINT (-53.98596 -30.45563)
# 6 T21JYG 244 POINT (-53.98586 -30.45562)
# Join information
ras.ds.sf.ll <- rbind(ras.ds1.sf.ll, ras.ds2.sf.ll)
ras.df.ll <- sf_to_df(ras.ds.sf.ll, fill = TRUE, unlist = NULL)
# Mean values by tile
ras.df.ll %>%
group_by(x,y) %>% dplyr::summarise(values=mean(values))
# `summarise()` has grouped output by 'x'. You can override using the `.groups` argument.
# # A tibble: 12 x 3
# # Groups: x [12]
# x y values
# <dbl> <dbl> <dbl>
# 1 -54.0 -30.5 321
# 2 -54.0 -30.5 249
# 3 -54.0 -30.5 234
# 4 -54.0 -30.5 238
# 5 -54.0 -30.5 224
# 6 -54.0 -30.5 244
# 7 -54.0 -30.5 336
# 8 -54.0 -30.5 355
# 9 -54.0 -30.5 363
# 10 -54.0 -30.5 426
# 11 -54.0 -30.5 341
# 12 -54.0 -30.5 308
# Nothing happened!!
# But if I try to change x and y values using accuracy
round_any = function(x, accuracy, f=round){f(x/ accuracy) * accuracy}
ras.df.ll2 <- ras.df.ll %>%
mutate(x = round_any(x, accuracy = 0.001),
y = round_any(y, accuracy = 0.001))
ras.df.ll2 %>%
group_by(x,y) %>% dplyr::summarise(values=mean(values))
`summarise()` has grouped output by 'x'. You can override using the `.groups` argument.
# # A tibble: 3 x 3
# # Groups: x [2]
# x y values
# <dbl> <dbl> <dbl>
# 1 -54.0 -30.5 252.
# 2 -54.0 -30.5 370
# 3 -54.0 -30.5 324.
# Hapens something but is wrong!!
请问有什么方法可以进行这种校正均值提取吗?
提前致谢
亚历山大
首先让我们注意到您写的内容不必要地复杂。
我建议你这样做。
fst = function(data)
st_as_sf(data, coords = c("x", "y"),
crs = data$crs, agr = "constant") %>%
st_transform(crs=4326) %>%
sf_to_df(fill = TRUE, unlist = NULL)
df = tibble(
tile = rep(c("T21JYG", "T22JBM"), each = 6) %>% fct_inorder(),
values = c(321,249,234,238,224,244,336,355,363,426,341,308),
x = c(789385,789395,789405,789415,789425,789435,
213285,213295,213305,213315,213325,213335),
y = c(6626865,6626865,6626865,6626865,6626865,6626865,
6626955,6626955,6626955,6626955,6626955,6626955),
crs = rep(c(32721, 32722), each = 6)
) %>% group_by(tile) %>%
nest(data=x:crs) %>%
mutate(st = map(data, ~ fst(.x))) %>%
unnest(st) %>%
mutate(
x = x %>% plyr::round_any(accuracy = 0.001) %>% paste(),
y = y %>% plyr::round_any(accuracy = 0.001) %>% paste(),
) %>% group_by(x,y)
输出
# A tibble: 12 x 8
# Groups: x, y [3]
tile values data crs sfg_id point_id x y
<fct> <dbl> <list> <dbl> <int> <int> <chr> <chr>
1 T21JYG 321 <tibble [1 x 3]> 32721 1 1 -53.986 -30.456
2 T21JYG 249 <tibble [1 x 3]> 32721 1 1 -53.986 -30.456
3 T21JYG 234 <tibble [1 x 3]> 32721 1 1 -53.986 -30.456
4 T21JYG 238 <tibble [1 x 3]> 32721 1 1 -53.986 -30.456
5 T21JYG 224 <tibble [1 x 3]> 32721 1 1 -53.986 -30.456
6 T21JYG 244 <tibble [1 x 3]> 32721 1 1 -53.986 -30.456
7 T22JBM 336 <tibble [1 x 3]> 32722 1 1 -53.986 -30.455
8 T22JBM 355 <tibble [1 x 3]> 32722 1 1 -53.986 -30.455
9 T22JBM 363 <tibble [1 x 3]> 32722 1 1 -53.986 -30.455
10 T22JBM 426 <tibble [1 x 3]> 32722 1 1 -53.986 -30.455
11 T22JBM 341 <tibble [1 x 3]> 32722 1 1 -53.985 -30.455
12 T22JBM 308 <tibble [1 x 3]> 32722 1 1 -53.985 -30.455
但是,我完全不明白你看到的哪里不对
df %>% dplyr::summarise(values=mean(values))
输出
# A tibble: 3 x 3
# Groups: x [2]
x y values
<chr> <chr> <dbl>
1 -53.985 -30.455 324.
2 -53.986 -30.455 370
3 -53.986 -30.456 252.
这正是我们所做的,即 x
、y
组中 values
的 mean
。
明确你想要达到的目标。并且不要写在评论中,而是写在 post!
的正文中
嗯。我对 st_as_sf
、st_transform
、sf_to_df
函数一无所知。我不知道他们做了什么或如何解释他们的结果。但是请注意,它们被放置在管道中(在我的 fst
函数中),因此它们进行必要的计算,然后将结果传递给彼此。因此,这些结果必须是正确的。
也许问题在于您最初在 st_as_sf
函数中指定了两个不同的 crs
参数。我通过将这个变量放在 tibble
中来做到这一点。但是,您只将 crs = 4326
的一个值传递给 st_transform
函数。或许这里也应该传达两种不同的价值观?
最后我把round_any
函数中的accuracy
设置为0.00025
得到了6个结果。
看看这些答案。
# A tibble: 6 x 3
# Groups: x [6]
x y values
<chr> <chr> <dbl>
1 -53.98525 -30.4555 308
2 -53.9855 -30.4555 377.
3 -53.98575 -30.4555 312.
4 -53.986 -30.45575 231
5 -53.98625 -30.45575 242.
6 -53.9865 -30.45575 321
最后,我向您展示了正确的编程方法。
我有一个从 S2 卫星栅格(10 x 10 米)中提取的数据集,其中包含 12 个值 (ras.df.ll
),但 6 个在图块中 (T21JYG
),第二个在另一个图块中(T21JYG
)。
我想计算瓷砖之间相同(x,y 坐标)的平均值但没有成功。我找不到任何方法来识别第一个图块中的第一行与第二个图块中的第一行的坐标相同,只是我的数据集的末尾。在我的例子中:
library(sf)
library(sfheaders)
library(dplyr)
# Raster (10x10 meters) info in data frame 1 - crs +proj=utm +zone=21 +south +datum=WGS84 +units=m +no_defs
x <-c(789385,789395,789405,789415,789425,789435)
y <-c(6626865,6626865,6626865,6626865,6626865,6626865)
tile <- rep("T21JYG",6)
values <-c(321,249,234,238,224,244)
ras.ds1<-data.frame(x,y,tile,values)
ras.ds1.sf <- st_as_sf(ras.ds1, coords = c("x", "y"), crs = 32721, agr = "constant")
ras.ds1.sf.ll <- st_transform(ras.ds1.sf, crs=4326)
ras.ds1.sf.ll
#Simple feature collection with 6 features and 2 fields
#Attribute-geometry relationship: 2 constant, 0 aggregate, 0 identity
#Geometry type: POINT
#Dimension: XY
#Bounding box: xmin: -53.98638 ymin: -30.45564 xmax: -53.98586 ymax: -30.45562
#Geodetic CRS: WGS 84
# tile values geometry
# 1 T21JYG 321 POINT (-53.98638 -30.45564)
# 2 T21JYG 249 POINT (-53.98628 -30.45563)
# 3 T21JYG 234 POINT (-53.98617 -30.45563)
# 4 T21JYG 238 POINT (-53.98607 -30.45563)
# 5 T21JYG 224 POINT (-53.98596 -30.45563)
# 6 T21JYG 244 POINT (-53.98586 -30.45562)
# Raster (10x10 meters) info in data frame - crs +proj=utm +zone=22 +south +datum=WGS84 +units=m +no_defs
x <-c(213285,213295,213305,213315,213325,213335)
y <-c(6626955,6626955,6626955,6626955,6626955,6626955)
tile <- rep("T22JBM",6)
values <-c(336,355,363,426,341,308)
ras.ds2 <-data.frame(x,y,tile,values)
ras.ds2.sf <- st_as_sf(ras.ds2, coords = c("x", "y"), crs = 32722, agr = "constant")
ras.ds2.sf.ll <- st_transform(ras.ds2.sf, crs=4326)
ras.ds2.sf.ll
# Simple feature collection with 6 features and 2 fields
# Attribute-geometry relationship: 2 constant, 0 aggregate, 0 identity
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: -53.98638 ymin: -30.45564 xmax: -53.98586 ymax: -30.45562
# Geodetic CRS: WGS 84
# tile values geometry
# 1 T21JYG 321 POINT (-53.98638 -30.45564)
# 2 T21JYG 249 POINT (-53.98628 -30.45563)
# 3 T21JYG 234 POINT (-53.98617 -30.45563)
# 4 T21JYG 238 POINT (-53.98607 -30.45563)
# 5 T21JYG 224 POINT (-53.98596 -30.45563)
# 6 T21JYG 244 POINT (-53.98586 -30.45562)
# Join information
ras.ds.sf.ll <- rbind(ras.ds1.sf.ll, ras.ds2.sf.ll)
ras.df.ll <- sf_to_df(ras.ds.sf.ll, fill = TRUE, unlist = NULL)
# Mean values by tile
ras.df.ll %>%
group_by(x,y) %>% dplyr::summarise(values=mean(values))
# `summarise()` has grouped output by 'x'. You can override using the `.groups` argument.
# # A tibble: 12 x 3
# # Groups: x [12]
# x y values
# <dbl> <dbl> <dbl>
# 1 -54.0 -30.5 321
# 2 -54.0 -30.5 249
# 3 -54.0 -30.5 234
# 4 -54.0 -30.5 238
# 5 -54.0 -30.5 224
# 6 -54.0 -30.5 244
# 7 -54.0 -30.5 336
# 8 -54.0 -30.5 355
# 9 -54.0 -30.5 363
# 10 -54.0 -30.5 426
# 11 -54.0 -30.5 341
# 12 -54.0 -30.5 308
# Nothing happened!!
# But if I try to change x and y values using accuracy
round_any = function(x, accuracy, f=round){f(x/ accuracy) * accuracy}
ras.df.ll2 <- ras.df.ll %>%
mutate(x = round_any(x, accuracy = 0.001),
y = round_any(y, accuracy = 0.001))
ras.df.ll2 %>%
group_by(x,y) %>% dplyr::summarise(values=mean(values))
`summarise()` has grouped output by 'x'. You can override using the `.groups` argument.
# # A tibble: 3 x 3
# # Groups: x [2]
# x y values
# <dbl> <dbl> <dbl>
# 1 -54.0 -30.5 252.
# 2 -54.0 -30.5 370
# 3 -54.0 -30.5 324.
# Hapens something but is wrong!!
请问有什么方法可以进行这种校正均值提取吗? 提前致谢 亚历山大
首先让我们注意到您写的内容不必要地复杂。 我建议你这样做。
fst = function(data)
st_as_sf(data, coords = c("x", "y"),
crs = data$crs, agr = "constant") %>%
st_transform(crs=4326) %>%
sf_to_df(fill = TRUE, unlist = NULL)
df = tibble(
tile = rep(c("T21JYG", "T22JBM"), each = 6) %>% fct_inorder(),
values = c(321,249,234,238,224,244,336,355,363,426,341,308),
x = c(789385,789395,789405,789415,789425,789435,
213285,213295,213305,213315,213325,213335),
y = c(6626865,6626865,6626865,6626865,6626865,6626865,
6626955,6626955,6626955,6626955,6626955,6626955),
crs = rep(c(32721, 32722), each = 6)
) %>% group_by(tile) %>%
nest(data=x:crs) %>%
mutate(st = map(data, ~ fst(.x))) %>%
unnest(st) %>%
mutate(
x = x %>% plyr::round_any(accuracy = 0.001) %>% paste(),
y = y %>% plyr::round_any(accuracy = 0.001) %>% paste(),
) %>% group_by(x,y)
输出
# A tibble: 12 x 8
# Groups: x, y [3]
tile values data crs sfg_id point_id x y
<fct> <dbl> <list> <dbl> <int> <int> <chr> <chr>
1 T21JYG 321 <tibble [1 x 3]> 32721 1 1 -53.986 -30.456
2 T21JYG 249 <tibble [1 x 3]> 32721 1 1 -53.986 -30.456
3 T21JYG 234 <tibble [1 x 3]> 32721 1 1 -53.986 -30.456
4 T21JYG 238 <tibble [1 x 3]> 32721 1 1 -53.986 -30.456
5 T21JYG 224 <tibble [1 x 3]> 32721 1 1 -53.986 -30.456
6 T21JYG 244 <tibble [1 x 3]> 32721 1 1 -53.986 -30.456
7 T22JBM 336 <tibble [1 x 3]> 32722 1 1 -53.986 -30.455
8 T22JBM 355 <tibble [1 x 3]> 32722 1 1 -53.986 -30.455
9 T22JBM 363 <tibble [1 x 3]> 32722 1 1 -53.986 -30.455
10 T22JBM 426 <tibble [1 x 3]> 32722 1 1 -53.986 -30.455
11 T22JBM 341 <tibble [1 x 3]> 32722 1 1 -53.985 -30.455
12 T22JBM 308 <tibble [1 x 3]> 32722 1 1 -53.985 -30.455
但是,我完全不明白你看到的哪里不对
df %>% dplyr::summarise(values=mean(values))
输出
# A tibble: 3 x 3
# Groups: x [2]
x y values
<chr> <chr> <dbl>
1 -53.985 -30.455 324.
2 -53.986 -30.455 370
3 -53.986 -30.456 252.
这正是我们所做的,即 x
、y
组中 values
的 mean
。
明确你想要达到的目标。并且不要写在评论中,而是写在 post!
嗯。我对 st_as_sf
、st_transform
、sf_to_df
函数一无所知。我不知道他们做了什么或如何解释他们的结果。但是请注意,它们被放置在管道中(在我的 fst
函数中),因此它们进行必要的计算,然后将结果传递给彼此。因此,这些结果必须是正确的。
也许问题在于您最初在 st_as_sf
函数中指定了两个不同的 crs
参数。我通过将这个变量放在 tibble
中来做到这一点。但是,您只将 crs = 4326
的一个值传递给 st_transform
函数。或许这里也应该传达两种不同的价值观?
最后我把round_any
函数中的accuracy
设置为0.00025
得到了6个结果。
看看这些答案。
# A tibble: 6 x 3
# Groups: x [6]
x y values
<chr> <chr> <dbl>
1 -53.98525 -30.4555 308
2 -53.9855 -30.4555 377.
3 -53.98575 -30.4555 312.
4 -53.986 -30.45575 231
5 -53.98625 -30.45575 242.
6 -53.9865 -30.45575 321
最后,我向您展示了正确的编程方法。