R中的二进制密度图

Question

我有一个大型数据集，其中包含家庭位置的 x 和 y 坐标对。

我想要的是创建一个区分密集区域和不密集区域的二进制密度图（确切的阈值将通过启发式定义）。因此，我想说如果一个坐标对在某个区域内，它就被认为是密集的。在这种情况下，我不确定克里金法或反距离加权是否是合适的方法。

到目前为止，我的方法如下：

library(ggplot2)

m <- ggplot(df, aes(x = df$V1, y = df$V2)) +
  geom_point() 
m + geom_density_2d()

不幸的是，这并不能真正让我提取 binary map，而只能显示密度。一个可以想到的解决方案是第三列，它将坐标对分为密集和非密集类别。

非常感谢您的帮助！

数据数据摘录（总计 162'000 行）：

df <- structure(list(V1 = c(2690503, 2689797, 2690685, 2690685, 2689409, 
2689409, 2689141, 2689141, 2689141, 2689141, 2690515, 2690515, 
2689474, 2689148, 2689148, 2689148, 2689148, 2689148, 2689148, 
2690199, 2689760, 2689473, 2689198, 2689198, 2689558, 2690020, 
2690020, 2689612, 2689132, 2689208, 2689208, 2690244, 2689614, 
2689614, 2690264, 2690264, 2689625, 2690315, 2689720, 2689720, 
2689720, 2691037, 2691037, 2691037, 2691037, 2689433, 2689433, 
2690715, 2690715, 2690715, 2689560, 2689655, 2689563, 2690240, 
2690240, 2678781, 2689498, 2689206, 2690521, 2689662, 2689662, 
2690027, 2690027, 2689383, 2689383, 2690588, 2688999, 2689397, 
2689725, 2689725, 2689100, 2689100, 2689100, 2689100, 2689906, 
2690529, 2690529, 2690199, 2690164, 2689638, 2689638, 2689498, 
2689093, 2689093, 2690502, 2689740, 2689153, 2690536, 2689027, 
2689950, 2690959, 2690959, 2690486, 2690486, 2689794, 2689307, 
2690010, 2690010, 2689599, 2689599, 2689599, 2689464, 2689464, 
2689464, 2689464, 2689711, 2689711, 2690222, 2690602, 2690602, 
2689790, 2689790, 2689404, 2689428, 2689428, 2689387, 2688960, 
2688960, 2689258, 2689258, 2689258, 2689355, 2690795, 2689521, 
2689954, 2689954, 2688926, 2689543, 2689363, 2689363, 2689186, 
2689186, 2689571, 2689571, 2689970, 2689970, 2689675, 2689498, 
2690941, 2690941, 2689060, 2689442, 2690122, 2690725, 2690725, 
2689419, 2689366, 2689366, 2689097, 2689332, 2690123, 2690123, 
2690994, 2690994, 2690180, 2690180, 2689706, 2689706, 2689612, 
2690074, 2688961, 2688961, 2689692, 2689260, 2689419, 2689419, 
2689146, 2690083, 2690625, 2690167, 2690167, 2689540, 2689540, 
2689512, 2689512, 2690469, 2689720, 2689711, 2690874, 2690072, 
2690072, 2690072, 2688946, 2689502, 2689431, 2689531, 2689131, 
2689131, 2690257, 2690001, 2689608, 2689843, 2689502, 2689773, 
2689773, 2689507, 2690060, 2678781, 2689500, 2689260), V2 = c(1254816, 
1254916, 1254061, 1254061, 1255542, 1255542, 1255220, 1255220, 
1255220, 1255220, 1254872, 1254872, 1255561, 1255199, 1255199, 
1255199, 1255199, 1255199, 1255199, 1255390, 1255667, 1255233, 
1255830, 1255830, 1255029, 1254812, 1254812, 1255297, 1255391, 
1255728, 1255728, 1254961, 1255385, 1255385, 1255149, 1255149, 
1255704, 1255312, 1254949, 1254949, 1254949, 1253836, 1253836, 
1253836, 1253836, 1255130, 1255130, 1253886, 1253886, 1253886, 
1255124, 1254928, 1255858, 1255267, 1255267, 1237314, 1255231, 
1255426, 1254796, 1255315, 1255315, 1255231, 1255231, 1255065, 
1255065, 1254882, 1255504, 1255493, 1255279, 1255279, 1256005, 
1256005, 1256005, 1256005, 1255418, 1254909, 1254909, 1255390, 
1255233, 1255716, 1255716, 1255231, 1255787, 1255787, 1253745, 
1255672, 1255827, 1254775, 1255813, 1255187, 1254105, 1254105, 
1255155, 1255155, 1255128, 1255623, 1255448, 1255448, 1255397, 
1255397, 1255397, 1255353, 1255353, 1255353, 1255353, 1255306, 
1255306, 1254824, 1254771, 1254771, 1255170, 1255170, 1255380, 
1255919, 1255919, 1255204, 1255885, 1255885, 1256001, 1256001, 
1256001, 1255552, 1254091, 1255334, 1255052, 1255052, 1255609, 
1254960, 1255090, 1255090, 1255426, 1255426, 1255140, 1255140, 
1254886, 1254886, 1255464, 1255231, 1254052, 1254052, 1255454, 
1255219, 1255265, 1254818, 1254818, 1255145, 1255407, 1255407, 
1255180, 1255187, 1255179, 1255179, 1253875, 1253875, 1255001, 
1255001, 1255573, 1255573, 1255232, 1255023, 1255458, 1255458, 
1255305, 1255453, 1255145, 1255145, 1255275, 1255209, 1254931, 
1255447, 1255447, 1255003, 1255003, 1255760, 1255760, 1254749, 
1254949, 1255306, 1253851, 1255286, 1255286, 1255286, 1255599, 
1255183, 1255580, 1255320, 1255460, 1255460, 1254804, 1255360, 
1255731, 1255470, 1255183, 1255131, 1255131, 1255081, 1255245, 
1237314, 1255586, 1255419)), row.names = c(NA, 200L), class = "data.frame")

Answer 1

一个类似的问题has been asked in this thread。我正在修改给出的答案。

我正在使用 MASS:kde2d 制作您实际上也提到的数据框。请注意，此函数将给出矩阵 z ，它 - 像往常一样 - 按列排列。因此，您需要将 y 坐标重复 each，将 x 重复 times

library(tidyverse)

# Manually estimate the density for each coordinate value with MASS::kde2d
# I scale the data first and remove this weird outlier in your data 
scaled_df <- df %>% scale() %>% as.data.frame() %>% filter (V1 > -7)

kde2d_est <- MASS::kde2d(scaled_df$V1, scaled_df$V2)

res <- data.frame(x = rep(kde2d_est$x, times = dim(kde2d_est$z)[1]),
                  y = rep(kde2d_est$y, each = dim(kde2d_est$z)[2]),
                  density = as.numeric(kde2d_est$z))

ggplot() +
  geom_point(data = scaled_df, aes(V1, V2)) +
  geom_tile(data = res, aes(x = x, y = y, fill = density), alpha = 0.5)

看起来已经不错了

现在让我们制作 "binary density"，任意截断。您不需要像我链接到的答案中那样 'cuts' 。只需使用逻辑语句即可。

arbitrary_cut <- 1
res_bin <- res %>% mutate(binary_dens = density < arbitrary_cut)

ggplot() +
  geom_point(data = scaled_df, aes(V1, V2)) +
  geom_tile(data = res_bin, aes(x = x, y = y, fill = binary_dens), alpha = 0.5)

^{由 reprex package (v0.3.0)}

于 2020-04-05 创建

R中的二进制密度图

Binary density map in R

r

ggplot2

data-visualization

density-plot