如何计算犯罪密度?
How to Calculate Crime Density?
总体目标:计算美国城市网格结构中的犯罪密度。每个方格应为 100 平方米。我有一个数据框 crime.inc 列出了个人犯罪实例的纬度和经度;像这样:
incident id lat lon
1001 45.123 -122.456
1002 45.456 -122.789
接下来,我有一个预定义的网格 g 这是一个规则的网格
predef.grid <- data.frame(lat = seq(from = 44, to = 45, by = 0.1),lon = seq(from = -122, to = -121, by = 0.1))
id <- rownames(predef.grid) # add row ids
predef.grid <- cbind(id=id, predef.grid) # add row ids
我的输出需要是这样的,每一行都是预定义网格中的唯一网格,计数是该网格中的事件数:
id lat lon count
1001 45.123 -122.789 4
1002 45.456 -122.987 5
我尝试过使用各种形式的 sp、sf、raster、rgeos,但从未完全成功!如有任何帮助,我们将不胜感激!
根据问题的数据表明,纬度和经度只有小数点后三位。因此,您可以简单地使用 dplyr 按位置分组,而不需要使用 GIS 包。
library(dplyr)
densities <- crime.inc %>% group_by(lat,lon) %>%
summarise(count=n())
这样您将丢失 ID。如果你想保留 ID
library(dplyr)
densities <- crime.inc %>% group_by(lat,lon) %>%
rename(count=n())
假设“与 lat/lon 坐标相关的 0.001 大约 = 100m”可能站不住脚。距离将取决于您在世界上的哪个位置,但使用您所在地区的示例数据:
library(sf)
# adjust latitude by 0.001
df <- data.frame(lat = c(45.123, 45.124), lon = c(-122.789, -122.789))
df.sf <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326)
> st_distance(df.sf)
Units: m
[,1] [,2]
[1,] 0.0000 111.1342
[2,] 111.1342 0.0000
#Or, if we adjust the longitude by 0.001:
df <- data.frame(lat = c(45.123, 45.123), lon = c(-122.789, -122.790))
df.sf <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326)
> st_distance(df.sf)
Units: m
[,1] [,2]
[1,] 0.00000 78.67796
[2,] 78.67796 0.00000
这是使用 sf
软件包解决您的问题的替代方法:
# add a few more points to make it more interesting
df <- data.frame(id = c(1001, 1002, 1003, 1004, 1005),
lat = c(45.123, 45.123, 45.126, 45.121, 45.130),
lon = c(-122.456, -122.457, -122.444, -122.442, -122.445))
# convert to an sf object and set projection (crs) to 4326 (lon/lat)
df.sf <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326)
# transform to UTM (Zone 10) for distance
df.utm <- st_transform(df.sf, "+proj=utm +zone=10 +datum=WGS84 +units=m +no_defs")
# create a 100m grid on these points
grid.100 <- st_make_grid(x = df.utm, cellsize = c(100, 100))
# plot to make sure
library(ggplot2)
ggplot() +
geom_sf(data = df.utm, size = 3) +
geom_sf(data = grid.100, alpha = 0)
# 将网格转换为 sf(不是 sfc)并添加一个 id 列
grid.sf <- st_sf(grid.100)
grid.sf$id <- 1:nrow(grid.sf)
# find how many points intersect each grid cell by using lengths() to get the number of points that intersect each grid square
grid.sf$count <- st_intersects(grid.sf, df.utm) %>% lengths()
绘图检查
ggplot() +
geom_sf(data = grid.sf, alpha = 0.5, aes(fill = as.factor(count))) +
geom_sf(data = df.utm, size = 3) +
scale_fill_discrete("Number of Points")
总体目标:计算美国城市网格结构中的犯罪密度。每个方格应为 100 平方米。我有一个数据框 crime.inc 列出了个人犯罪实例的纬度和经度;像这样:
incident id lat lon
1001 45.123 -122.456
1002 45.456 -122.789
接下来,我有一个预定义的网格 g 这是一个规则的网格
predef.grid <- data.frame(lat = seq(from = 44, to = 45, by = 0.1),lon = seq(from = -122, to = -121, by = 0.1))
id <- rownames(predef.grid) # add row ids
predef.grid <- cbind(id=id, predef.grid) # add row ids
我的输出需要是这样的,每一行都是预定义网格中的唯一网格,计数是该网格中的事件数:
id lat lon count
1001 45.123 -122.789 4
1002 45.456 -122.987 5
我尝试过使用各种形式的 sp、sf、raster、rgeos,但从未完全成功!如有任何帮助,我们将不胜感激!
根据问题的数据表明,纬度和经度只有小数点后三位。因此,您可以简单地使用 dplyr 按位置分组,而不需要使用 GIS 包。
library(dplyr)
densities <- crime.inc %>% group_by(lat,lon) %>%
summarise(count=n())
这样您将丢失 ID。如果你想保留 ID
library(dplyr)
densities <- crime.inc %>% group_by(lat,lon) %>%
rename(count=n())
假设“与 lat/lon 坐标相关的 0.001 大约 = 100m”可能站不住脚。距离将取决于您在世界上的哪个位置,但使用您所在地区的示例数据:
library(sf)
# adjust latitude by 0.001
df <- data.frame(lat = c(45.123, 45.124), lon = c(-122.789, -122.789))
df.sf <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326)
> st_distance(df.sf)
Units: m
[,1] [,2]
[1,] 0.0000 111.1342
[2,] 111.1342 0.0000
#Or, if we adjust the longitude by 0.001:
df <- data.frame(lat = c(45.123, 45.123), lon = c(-122.789, -122.790))
df.sf <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326)
> st_distance(df.sf)
Units: m
[,1] [,2]
[1,] 0.00000 78.67796
[2,] 78.67796 0.00000
这是使用 sf
软件包解决您的问题的替代方法:
# add a few more points to make it more interesting
df <- data.frame(id = c(1001, 1002, 1003, 1004, 1005),
lat = c(45.123, 45.123, 45.126, 45.121, 45.130),
lon = c(-122.456, -122.457, -122.444, -122.442, -122.445))
# convert to an sf object and set projection (crs) to 4326 (lon/lat)
df.sf <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326)
# transform to UTM (Zone 10) for distance
df.utm <- st_transform(df.sf, "+proj=utm +zone=10 +datum=WGS84 +units=m +no_defs")
# create a 100m grid on these points
grid.100 <- st_make_grid(x = df.utm, cellsize = c(100, 100))
# plot to make sure
library(ggplot2)
ggplot() +
geom_sf(data = df.utm, size = 3) +
geom_sf(data = grid.100, alpha = 0)
# find how many points intersect each grid cell by using lengths() to get the number of points that intersect each grid square
grid.sf$count <- st_intersects(grid.sf, df.utm) %>% lengths()
绘图检查
ggplot() +
geom_sf(data = grid.sf, alpha = 0.5, aes(fill = as.factor(count))) +
geom_sf(data = df.utm, size = 3) +
scale_fill_discrete("Number of Points")