当值匹配时,基于数据框将值添加到不同的数据框
Based on a dataframe add values to a different dataframe when values match up
这很难解释,但基本上我有一个非常简单的包含县和个案的数据框
dat <- "County Cases
1 Borden 5
2 Bosque 3
3 Bowue 1"
我有一个来自 TEX <- map_data('county', 'texas')
的大数据框。
> head(TEX)
long lat group order region subregion
1 -95.75271 31.53560 1 1 texas anderson
2 -95.76989 31.55852 1 2 texas anderson
3 -95.76416 31.58143 1 3 texas anderson
4 -95.72979 31.58143 1 4 texas anderson
5 -95.74698 31.61008 1 5 texas anderson
6 -95.72405 31.63873 1 6 texas anderson
我想做的是检查每一行,如果子区域在数据帧 dat 中,则将相应数量的案例添加到 中的新列TEX 调用了 "cases" 或者如果没有调用则加 0。
例如
> head(TEX)
long lat group order region subregion cases
1 -95.75271 31.53560 1 1 texas anderson 0
2 -95.76989 31.55852 1 2 texas anderson 0
3 -95.76416 31.58143 1 3 texas anderson 0
4 -95.72979 31.58143 1 4 texas anderson 0
5 -95.74698 31.61008 1 5 texas Borden 5
6 -95.72405 31.63873 1 6 texas Bosque 3
我试着用这段代码来做
for (val in counties$counties) {
for (vall in TEX$subregion) {
if (val == vall) TEX$cases = counties$cases
}
}
但是我得到这个错误
Error in `$<-.data.frame`(`*tmp*`, "cases", value = c(5L, 3L, 2L, 1L, :
replacement has 10 rows, data has 4488
我在这里的最终目标是能够根据我不断增长的县和病例列表创建一个包含 COVID 病例的德克萨斯州县的等值线。如果你有比我更好的方法来做到这一点!
此致!
更新:Ian 的解决方案效果很好,但它会导致 ggplot 和映射出现问题。如果我在合并前截取数据帧 TEX 的一部分,它看起来像这样
6 -96.81268 28.28693 4 76 texas aransas
77 -96.80695 28.25828 4 77 texas aransas
78 -96.82414 28.21817 4 78 texas aransas
79 -96.87570 28.19525 4 79 texas aransas
80 -96.91009 28.16660 4 80 texas aransas
81 -96.94446 28.14942 4 81 texas aransas
82 -96.94446 28.18379 4 82 texas aransas
83 -96.92727 28.24109 4 83 texas aransas
84 -96.92154 28.26974 4 84 texas aransas
85 -96.94446 28.27547 4 85 texas aransas
86 -96.99030 28.25255 4 86 texas aransas
87 -96.98457 28.23536 4 87 texas aransas
88 -96.97311 28.21817 4 88 texas aransas
89 -96.96165 28.19525 4 89 texas aransas
90 -96.97311 28.17233 4 90 texas aransas
91 -97.00175 28.15515 4 91 texas aransas
92 -97.03613 28.15515 4 92 texas aransas
93 -97.04186 28.17233 4 93 texas aransas
94 -97.03613 28.20098 4 94 texas aransas
95 -97.05905 28.21817 4 95 texas aransas
96 -97.07624 28.20671 4 96 texas aransas
97 -97.11062 28.21817 4 97 texas aransas
98 -97.12780 28.23536 4 98 texas aransas
99 -97.12780 28.25255 4 99 texas aransas
100 -97.11062 28.26401 4 100 texas aransas
101 -97.01894 28.27547 4 101 texas aransas
102 -96.80122 28.31557 4 102 texas aransas
并在绘图后
ggplot(TEX, aes(long,lat, group = group)) + geom_polygon(aes(fill = subregion),color = "black") + theme(legend.position = "none") + coord_quickmap()
看起来很棒!现在当我执行合并函数时,TEX 被重新排列
72 aransas -97.00175 28.15515 4 91 texas 1
73 aransas -97.04186 28.17233 4 93 texas 1
74 aransas -96.80695 28.25828 4 77 texas 1
75 aransas -96.80122 28.31557 4 102 texas 1
76 aransas -97.03613 28.15515 4 92 texas 1
77 aransas -96.81268 28.28693 4 76 texas 1
78 aransas -97.12780 28.25255 4 99 texas 1
79 aransas -97.11062 28.26401 4 100 texas 1
80 aransas -96.97311 28.17233 4 90 texas 1
81 aransas -97.12780 28.23536 4 98 texas 1
82 aransas -97.07624 28.20671 4 96 texas 1
83 aransas -96.94446 28.27547 4 85 texas 1
84 aransas -97.01894 28.27547 4 101 texas 1
85 aransas -96.96165 28.19525 4 89 texas 1
86 aransas -97.11062 28.21817 4 97 texas 1
87 aransas -96.87570 28.19525 4 79 texas 1
88 aransas -97.03613 28.20098 4 94 texas 1
89 aransas -97.05905 28.21817 4 95 texas 1
90 aransas -96.97311 28.21817 4 88 texas 1
91 aransas -96.92154 28.26974 4 84 texas 1
92 aransas -96.99030 28.25255 4 86 texas 1
93 aransas -96.98457 28.23536 4 87 texas 1
94 aransas -96.82414 28.21817 4 78 texas 1
95 aransas -96.80122 28.31557 4 75 texas 1
96 aransas -96.94446 28.14942 4 81 texas 1
97 aransas -96.91009 28.16660 4 80 texas 1
98 aransas -96.92727 28.24109 4 83 texas 1
99 aransas -96.94446 28.18379 4 82 texas 1
现在地图看起来像这样...
如何保存TEX 的原始顺序?或者等等,也许我只需要按顺序排序....
更新#2
TEX <- TEX[order(TEX$order),]
问题解决了。我很好奇为什么合并会那样改变顺序
我们可以使用基础 R 中的 merge
。
result <- merge(TEX,dat,by.x="subregion",by.y="County",all.x=TRUE)
result
subregion long lat group order region Cases
1 anderson -95.75271 31.53560 1 1 texas NA
2 anderson -95.76989 31.55852 1 2 texas NA
3 anderson -95.76416 31.58143 1 3 texas NA
4 anderson -95.72979 31.58143 1 4 texas NA
5 anderson -95.74698 31.61008 1 5 texas NA
6 anderson -95.72405 31.63873 1 6 texas NA
7 Borden -95.74698 31.61008 1 5 texas 5
8 Bosque -95.72405 31.63873 1 6 texas 3
然后我们可以将NA
替换为0
。
result$Cases[is.na(result$Cases)] <- 0
result
subregion long lat group order region Cases
1 anderson -95.75271 31.53560 1 1 texas 0
2 anderson -95.76989 31.55852 1 2 texas 0
3 anderson -95.76416 31.58143 1 3 texas 0
4 anderson -95.72979 31.58143 1 4 texas 0
5 anderson -95.74698 31.61008 1 5 texas 0
6 anderson -95.72405 31.63873 1 6 texas 0
7 Borden -95.74698 31.61008 1 5 texas 5
8 Bosque -95.72405 31.63873 1 6 texas 3
数据
TEX <- structure(list(long = c(-95.75271, -95.76989, -95.76416, -95.72979,
-95.74698, -95.72405, -95.74698, -95.72405), lat = c(31.5356,
31.55852, 31.58143, 31.58143, 31.61008, 31.63873, 31.61008, 31.63873
), group = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), order = c(1L, 2L,
3L, 4L, 5L, 6L, 5L, 6L), region = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = "texas", class = "factor"), subregion = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 3L), .Label = c("anderson", "Borden",
"Bosque"), class = "factor")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
dat <- structure(list(County = structure(1:3, .Label = c("Borden", "Bosque",
"Bowue"), class = "factor"), Cases = c(5L, 3L, 1L)), class = "data.frame", row.names = c("1",
"2", "3"))
这很难解释,但基本上我有一个非常简单的包含县和个案的数据框
dat <- "County Cases
1 Borden 5
2 Bosque 3
3 Bowue 1"
我有一个来自 TEX <- map_data('county', 'texas')
的大数据框。
> head(TEX)
long lat group order region subregion
1 -95.75271 31.53560 1 1 texas anderson
2 -95.76989 31.55852 1 2 texas anderson
3 -95.76416 31.58143 1 3 texas anderson
4 -95.72979 31.58143 1 4 texas anderson
5 -95.74698 31.61008 1 5 texas anderson
6 -95.72405 31.63873 1 6 texas anderson
我想做的是检查每一行,如果子区域在数据帧 dat 中,则将相应数量的案例添加到 中的新列TEX 调用了 "cases" 或者如果没有调用则加 0。
例如
> head(TEX)
long lat group order region subregion cases
1 -95.75271 31.53560 1 1 texas anderson 0
2 -95.76989 31.55852 1 2 texas anderson 0
3 -95.76416 31.58143 1 3 texas anderson 0
4 -95.72979 31.58143 1 4 texas anderson 0
5 -95.74698 31.61008 1 5 texas Borden 5
6 -95.72405 31.63873 1 6 texas Bosque 3
我试着用这段代码来做
for (val in counties$counties) {
for (vall in TEX$subregion) {
if (val == vall) TEX$cases = counties$cases
}
}
但是我得到这个错误
Error in `$<-.data.frame`(`*tmp*`, "cases", value = c(5L, 3L, 2L, 1L, :
replacement has 10 rows, data has 4488
我在这里的最终目标是能够根据我不断增长的县和病例列表创建一个包含 COVID 病例的德克萨斯州县的等值线。如果你有比我更好的方法来做到这一点!
此致!
更新:Ian 的解决方案效果很好,但它会导致 ggplot 和映射出现问题。如果我在合并前截取数据帧 TEX 的一部分,它看起来像这样
6 -96.81268 28.28693 4 76 texas aransas
77 -96.80695 28.25828 4 77 texas aransas
78 -96.82414 28.21817 4 78 texas aransas
79 -96.87570 28.19525 4 79 texas aransas
80 -96.91009 28.16660 4 80 texas aransas
81 -96.94446 28.14942 4 81 texas aransas
82 -96.94446 28.18379 4 82 texas aransas
83 -96.92727 28.24109 4 83 texas aransas
84 -96.92154 28.26974 4 84 texas aransas
85 -96.94446 28.27547 4 85 texas aransas
86 -96.99030 28.25255 4 86 texas aransas
87 -96.98457 28.23536 4 87 texas aransas
88 -96.97311 28.21817 4 88 texas aransas
89 -96.96165 28.19525 4 89 texas aransas
90 -96.97311 28.17233 4 90 texas aransas
91 -97.00175 28.15515 4 91 texas aransas
92 -97.03613 28.15515 4 92 texas aransas
93 -97.04186 28.17233 4 93 texas aransas
94 -97.03613 28.20098 4 94 texas aransas
95 -97.05905 28.21817 4 95 texas aransas
96 -97.07624 28.20671 4 96 texas aransas
97 -97.11062 28.21817 4 97 texas aransas
98 -97.12780 28.23536 4 98 texas aransas
99 -97.12780 28.25255 4 99 texas aransas
100 -97.11062 28.26401 4 100 texas aransas
101 -97.01894 28.27547 4 101 texas aransas
102 -96.80122 28.31557 4 102 texas aransas
并在绘图后
ggplot(TEX, aes(long,lat, group = group)) + geom_polygon(aes(fill = subregion),color = "black") + theme(legend.position = "none") + coord_quickmap()
看起来很棒!现在当我执行合并函数时,TEX 被重新排列
72 aransas -97.00175 28.15515 4 91 texas 1
73 aransas -97.04186 28.17233 4 93 texas 1
74 aransas -96.80695 28.25828 4 77 texas 1
75 aransas -96.80122 28.31557 4 102 texas 1
76 aransas -97.03613 28.15515 4 92 texas 1
77 aransas -96.81268 28.28693 4 76 texas 1
78 aransas -97.12780 28.25255 4 99 texas 1
79 aransas -97.11062 28.26401 4 100 texas 1
80 aransas -96.97311 28.17233 4 90 texas 1
81 aransas -97.12780 28.23536 4 98 texas 1
82 aransas -97.07624 28.20671 4 96 texas 1
83 aransas -96.94446 28.27547 4 85 texas 1
84 aransas -97.01894 28.27547 4 101 texas 1
85 aransas -96.96165 28.19525 4 89 texas 1
86 aransas -97.11062 28.21817 4 97 texas 1
87 aransas -96.87570 28.19525 4 79 texas 1
88 aransas -97.03613 28.20098 4 94 texas 1
89 aransas -97.05905 28.21817 4 95 texas 1
90 aransas -96.97311 28.21817 4 88 texas 1
91 aransas -96.92154 28.26974 4 84 texas 1
92 aransas -96.99030 28.25255 4 86 texas 1
93 aransas -96.98457 28.23536 4 87 texas 1
94 aransas -96.82414 28.21817 4 78 texas 1
95 aransas -96.80122 28.31557 4 75 texas 1
96 aransas -96.94446 28.14942 4 81 texas 1
97 aransas -96.91009 28.16660 4 80 texas 1
98 aransas -96.92727 28.24109 4 83 texas 1
99 aransas -96.94446 28.18379 4 82 texas 1
现在地图看起来像这样...
如何保存TEX 的原始顺序?或者等等,也许我只需要按顺序排序....
更新#2
TEX <- TEX[order(TEX$order),]
问题解决了。我很好奇为什么合并会那样改变顺序
我们可以使用基础 R 中的 merge
。
result <- merge(TEX,dat,by.x="subregion",by.y="County",all.x=TRUE)
result
subregion long lat group order region Cases
1 anderson -95.75271 31.53560 1 1 texas NA
2 anderson -95.76989 31.55852 1 2 texas NA
3 anderson -95.76416 31.58143 1 3 texas NA
4 anderson -95.72979 31.58143 1 4 texas NA
5 anderson -95.74698 31.61008 1 5 texas NA
6 anderson -95.72405 31.63873 1 6 texas NA
7 Borden -95.74698 31.61008 1 5 texas 5
8 Bosque -95.72405 31.63873 1 6 texas 3
然后我们可以将NA
替换为0
。
result$Cases[is.na(result$Cases)] <- 0
result
subregion long lat group order region Cases
1 anderson -95.75271 31.53560 1 1 texas 0
2 anderson -95.76989 31.55852 1 2 texas 0
3 anderson -95.76416 31.58143 1 3 texas 0
4 anderson -95.72979 31.58143 1 4 texas 0
5 anderson -95.74698 31.61008 1 5 texas 0
6 anderson -95.72405 31.63873 1 6 texas 0
7 Borden -95.74698 31.61008 1 5 texas 5
8 Bosque -95.72405 31.63873 1 6 texas 3
数据
TEX <- structure(list(long = c(-95.75271, -95.76989, -95.76416, -95.72979,
-95.74698, -95.72405, -95.74698, -95.72405), lat = c(31.5356,
31.55852, 31.58143, 31.58143, 31.61008, 31.63873, 31.61008, 31.63873
), group = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), order = c(1L, 2L,
3L, 4L, 5L, 6L, 5L, 6L), region = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = "texas", class = "factor"), subregion = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 3L), .Label = c("anderson", "Borden",
"Bosque"), class = "factor")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
dat <- structure(list(County = structure(1:3, .Label = c("Borden", "Bosque",
"Bowue"), class = "factor"), Cases = c(5L, 3L, 1L)), class = "data.frame", row.names = c("1",
"2", "3"))