当值匹配时,基于数据框将值添加到不同的数据框

Based on a dataframe add values to a different dataframe when values match up

这很难解释,但基本上我有一个非常简单的包含县和个案的数据框

dat <- "County   Cases
1       Borden   5
2       Bosque   3
3       Bowue    1"

我有一个来自 TEX <- map_data('county', 'texas') 的大数据框。

> head(TEX)
       long      lat group order region subregion
1 -95.75271 31.53560     1     1  texas  anderson
2 -95.76989 31.55852     1     2  texas  anderson
3 -95.76416 31.58143     1     3  texas  anderson
4 -95.72979 31.58143     1     4  texas  anderson
5 -95.74698 31.61008     1     5  texas  anderson
6 -95.72405 31.63873     1     6  texas  anderson

我想做的是检查每一行,如果子区域在数据帧 dat 中,则将相应数量的案例添加到 中的新列TEX 调用了 "cases" 或者如果没有调用则加 0。

例如

> head(TEX)
       long      lat group order region subregion cases
1 -95.75271 31.53560     1     1  texas  anderson 0
2 -95.76989 31.55852     1     2  texas  anderson 0
3 -95.76416 31.58143     1     3  texas  anderson 0
4 -95.72979 31.58143     1     4  texas  anderson 0
5 -95.74698 31.61008     1     5  texas  Borden   5
6 -95.72405 31.63873     1     6  texas  Bosque   3

我试着用这段代码来做

for (val in counties$counties) {
     for (vall in TEX$subregion) {
         if (val == vall) TEX$cases = counties$cases
     }
}

但是我得到这个错误

Error in `$<-.data.frame`(`*tmp*`, "cases", value = c(5L, 3L, 2L, 1L,  : 
  replacement has 10 rows, data has 4488

我在这里的最终目标是能够根据我不断增长的县和病例列表创建一个包含 COVID 病例的德克萨斯州县的等值线。如果你有比我更好的方法来做到这一点!

此致!

更新:Ian 的解决方案效果很好,但它会导致 ggplot 和映射出现问题。如果我在合并前截取数据帧 TEX 的一部分,它看起来像这样

6   -96.81268   28.28693    4   76  texas   aransas
77  -96.80695   28.25828    4   77  texas   aransas
78  -96.82414   28.21817    4   78  texas   aransas
79  -96.87570   28.19525    4   79  texas   aransas
80  -96.91009   28.16660    4   80  texas   aransas
81  -96.94446   28.14942    4   81  texas   aransas
82  -96.94446   28.18379    4   82  texas   aransas
83  -96.92727   28.24109    4   83  texas   aransas
84  -96.92154   28.26974    4   84  texas   aransas
85  -96.94446   28.27547    4   85  texas   aransas
86  -96.99030   28.25255    4   86  texas   aransas
87  -96.98457   28.23536    4   87  texas   aransas
88  -96.97311   28.21817    4   88  texas   aransas
89  -96.96165   28.19525    4   89  texas   aransas
90  -96.97311   28.17233    4   90  texas   aransas
91  -97.00175   28.15515    4   91  texas   aransas
92  -97.03613   28.15515    4   92  texas   aransas
93  -97.04186   28.17233    4   93  texas   aransas
94  -97.03613   28.20098    4   94  texas   aransas
95  -97.05905   28.21817    4   95  texas   aransas
96  -97.07624   28.20671    4   96  texas   aransas
97  -97.11062   28.21817    4   97  texas   aransas
98  -97.12780   28.23536    4   98  texas   aransas
99  -97.12780   28.25255    4   99  texas   aransas
100 -97.11062   28.26401    4   100 texas   aransas
101 -97.01894   28.27547    4   101 texas   aransas
102 -96.80122   28.31557    4   102 texas   aransas

并在绘图后

ggplot(TEX, aes(long,lat, group = group)) + geom_polygon(aes(fill = subregion),color = "black") + theme(legend.position = "none") + coord_quickmap()

看起来很棒!现在当我执行合并函数时,TEX 被重新排列

72  aransas -97.00175   28.15515    4   91  texas   1
73  aransas -97.04186   28.17233    4   93  texas   1
74  aransas -96.80695   28.25828    4   77  texas   1
75  aransas -96.80122   28.31557    4   102 texas   1
76  aransas -97.03613   28.15515    4   92  texas   1
77  aransas -96.81268   28.28693    4   76  texas   1
78  aransas -97.12780   28.25255    4   99  texas   1
79  aransas -97.11062   28.26401    4   100 texas   1
80  aransas -96.97311   28.17233    4   90  texas   1
81  aransas -97.12780   28.23536    4   98  texas   1
82  aransas -97.07624   28.20671    4   96  texas   1
83  aransas -96.94446   28.27547    4   85  texas   1
84  aransas -97.01894   28.27547    4   101 texas   1
85  aransas -96.96165   28.19525    4   89  texas   1
86  aransas -97.11062   28.21817    4   97  texas   1
87  aransas -96.87570   28.19525    4   79  texas   1
88  aransas -97.03613   28.20098    4   94  texas   1
89  aransas -97.05905   28.21817    4   95  texas   1
90  aransas -96.97311   28.21817    4   88  texas   1
91  aransas -96.92154   28.26974    4   84  texas   1
92  aransas -96.99030   28.25255    4   86  texas   1
93  aransas -96.98457   28.23536    4   87  texas   1
94  aransas -96.82414   28.21817    4   78  texas   1
95  aransas -96.80122   28.31557    4   75  texas   1
96  aransas -96.94446   28.14942    4   81  texas   1
97  aransas -96.91009   28.16660    4   80  texas   1
98  aransas -96.92727   28.24109    4   83  texas   1
99  aransas -96.94446   28.18379    4   82  texas   1

现在地图看起来像这样...

如何保存TEX 的原始顺序?或者等等,也许我只需要按顺序排序....

更新#2

TEX <- TEX[order(TEX$order),]

问题解决了。我很好奇为什么合并会那样改变顺序

我们可以使用基础 R 中的 merge

result <- merge(TEX,dat,by.x="subregion",by.y="County",all.x=TRUE)
result
  subregion      long      lat group order region Cases
1  anderson -95.75271 31.53560     1     1  texas    NA
2  anderson -95.76989 31.55852     1     2  texas    NA
3  anderson -95.76416 31.58143     1     3  texas    NA
4  anderson -95.72979 31.58143     1     4  texas    NA
5  anderson -95.74698 31.61008     1     5  texas    NA
6  anderson -95.72405 31.63873     1     6  texas    NA
7    Borden -95.74698 31.61008     1     5  texas     5
8    Bosque -95.72405 31.63873     1     6  texas     3

然后我们可以将NA替换为0

result$Cases[is.na(result$Cases)] <- 0
result
  subregion      long      lat group order region Cases
1  anderson -95.75271 31.53560     1     1  texas     0
2  anderson -95.76989 31.55852     1     2  texas     0
3  anderson -95.76416 31.58143     1     3  texas     0
4  anderson -95.72979 31.58143     1     4  texas     0
5  anderson -95.74698 31.61008     1     5  texas     0
6  anderson -95.72405 31.63873     1     6  texas     0
7    Borden -95.74698 31.61008     1     5  texas     5
8    Bosque -95.72405 31.63873     1     6  texas     3

数据

TEX <- structure(list(long = c(-95.75271, -95.76989, -95.76416, -95.72979, 
-95.74698, -95.72405, -95.74698, -95.72405), lat = c(31.5356, 
31.55852, 31.58143, 31.58143, 31.61008, 31.63873, 31.61008, 31.63873
), group = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), order = c(1L, 2L, 
3L, 4L, 5L, 6L, 5L, 6L), region = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = "texas", class = "factor"), subregion = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 2L, 3L), .Label = c("anderson", "Borden", 
"Bosque"), class = "factor")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

dat <- structure(list(County = structure(1:3, .Label = c("Borden", "Bosque", 
"Bowue"), class = "factor"), Cases = c(5L, 3L, 1L)), class = "data.frame", row.names = c("1", 
"2", "3"))