错误的地区填充在州地图上

Wrong districts filled on state map plot

我有一个德克萨斯州学区的 shapfile,我正在尝试使用 ggplot2 来特别突出显示 10 个。我已经对它进行了修补并设置了所有内容,但是当我对其进行抽查时,我意识到突出显示的 10 个地区实际上并不是我想要突出显示的地区。

shapefile 可以从这个 link 下载到 Texas Education Agency Public Open Data Site

#install.packages(c("ggplot2", "rgdal"))
library(ggplot2)
library(rgdal)
#rm(list=ls())

#setwd("path")

# read shapefile
tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp")

# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")

# extract from shapefile data just the name and ID, then subset to only the districts of interest
dist_info <- data.frame(cbind(as.character(tex@data$NAME2), as.character(tex@data$FID)), stringsAsFactors=FALSE)
names(dist_info) <- c("name", "id")
dist_info <- dist_info[dist_info$name %in% districts, ]

# turn shapefile into df
tex_df <- fortify(tex)

# create dummy fill var for if the district is one to be highlighted
tex_df$yes <- as.factor(ifelse(tex_df$id %in% dist_info$id, 1, 0))


# plot the graph
ggplot(data=tex_df) +
  geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") + 
  scale_fill_manual(values=cols) +
  theme_void() +
  theme(legend.position = "none")

如您所见,当绘图创建时,它看起来就像我想要的那样。问题是,突出显示的那十个地区并不是上面 districts 向量中的那些。我已经多次重新 运行 清理所有内容,仔细检查我没有遇到 factor/character 转换问题,并在网络数据资源管理器中仔细检查我从 shapefile 获得的 ID 是确实是那些应该与我的名单相匹配的人。我真的不知道这个问题可能来自哪里。

这是我第一次使用 shapefile 和 rgdal,所以如果我不得不猜测结构中有一些我不理解的简单内容,希望你们中的一个人能迅速指出来。谢谢!

这是输出:

选项 1

使用 fortify 函数添加参数 region 指定 "NAME2",然后列 id 将包含您的地区名称。然后根据该列创建虚拟填充变量。 我对德克萨斯州的地区不熟悉,但我认为结果是正确的。

tex <- tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp"))

# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")

# turn shapefile into df
tex_df <- fortify(tex, region = "NAME2")

# create dummy fill var for if the district is one to be highlighted
tex_df$yes <- as.factor(ifelse(tex_df$id %in% districts, 1, 0))

# plot the graph
ggplot(data=tex_df) +
geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")

备选方案 2

不将参数区域传递给 fortify 函数。解决 seeellayewhy 实施先前替代方案的问题。我们添加了两层,不需要创建虚拟变量或合并任何数据框。

tex <- tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp"))

# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")

 # Subset the shape file into two
tex1 <- subset(tex, NAME2 %in% districts)
tex2 <- subset(tex, !(NAME2 %in% districts)) 

# Create two data frames
tex_df1 <- fortify(tex1)
tex_df2 <- fortify(tex2)

# Plot two geom_polygon layers, one for each data frame
ggplot() +
  geom_polygon(data = tex_df1, 
               aes(x = long, y = lat, group = group, fill = "#CCCCCC"), 
               color = "#CCCCCC")+
  geom_polygon(data = tex_df2, 
               aes(x = long, y = lat, group = group, fill ="#003082")) + 
    scale_fill_manual(values=cols) +
  theme_void() +
  theme(legend.position = "none") 

在尝试实现@mpalanco 将 "region" 参数添加到 fortify() 函数的解决方案时,我遇到了一个错误,我可以通过许多其他堆栈帖子解决该错误 (Error: isTRUE(gpclibPermitStatus()) is not TRUE) .我也尝试使用 broom::tidy() ,它是 non-deprecated 等同于 fortify() 并且有同样的错误。

最终,我最终实现了来自 here 的@luchanocho 的解决方案。我不喜欢它使用 seq() 生成 ID 的事实,因为它不一定保留正确的顺序,但我的情况很简单,我可以遍历每个地区并确认突出显示了正确的地区。

我的代码如下。输出与@mpalanco 的答案相同。由于他显然得到了正确的结果并且使用了一些不像实施的解决方案那样不稳定的东西,所以我会假设它有效就给他答案。如果其他人遇到与我相同的错误,则可以将下面的解决方案视为一种解决方法。

#install.packages(c("ggplot2", "rgdal"))
library(ggplot2)
library(rgdal)
#rm(list=ls())

#setwd("path")

# read shapefile
tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp")

# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")


# convert shapefile to a df
tex_df <- fortify(tex)

# generate temp df with IDs to merge back in
names_df <- data.frame(tex@data$NAME2)
names(names_df) <- "NAME2"
names_df$id <- seq(0, nrow(names_df)-1)  # this is the part I felt was sketchy
final <- merge(tex_df, names_df, by="id")

# dummy out districts of interest
final$yes <- as.factor(ifelse(final$NAME2 %in% districts, 1, 0))


ggplot(data=final) +
  geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") + 
  scale_fill_manual(values=cols) +
  theme_void() +
  theme(legend.position = "none")