将函数应用于 R 中数据帧的列中的每个单元格
apply a function to each cell in a column of a dataframe in R
编辑 感谢@user5249203 指出地理编码最好通过 ggmaps 的地理编码调用完成。不过要注意NA。
我正在与 R 中的 apply
家庭作斗争。
我正在使用一个 function 接受一个字符串和 returns 经度和纬度
> gGeoCode("Philadelphia, PA")
[1] 39.95258 -75.16522
我有一个简单的数据框,其中包含所有 52 个州的名称:
dput(state_lat_long)
structure(
list(State = structure(
c(
32L, 28L, 43L, 5L, 23L, 34L,
30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
19L, 41L, 50L, 2L, 45L
), .Label = c(
"alabama", "alaska", "arizona",
"arkansas", "california", "colorado", "connecticut", "delaware",
"florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
"iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
"massachusetts", "michigan", "minnesota", "mississippi", "missouri",
"montana", "nebraska", "nevada", "new hampshire", "new jersey",
"new mexico", "new york", "north carolina", "north dakota", "ohio",
"oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
"south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
"washington", "west virginia", "wisconsin", "wyoming"
), class = "factor"
)), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)
为了练习我的 apply
技能,我只想将 gGeoCode
应用于 state_lat_long
数据框唯一列中的每个单元格。
再简单不过了。
那这有什么问题呢?
> View(apply(state_lat_long, function(x) gGeoCode(x)))
当我 运行 这个时,我得到:
Error in View : argument "FUN" is missing, with no default
我不明白,因为 FUN
没有丢失。
所以,让我们试试 sapply
。应该很简单吧?
但这有什么问题呢?
View(sapply(state_lat_long$State, function(x) gGeoCode(x)))
当我 运行 这样做时,我得到 2 行 50 列,其中包含 NA
。我无法理解它。
接下来,我尝试了
View(apply(state_lat_long, 2, function(x) gGeoCode(x)))
我得到了
State
40.71278
-74.00594
同样,这毫无意义!
我做错了什么?谢谢。
你的数据框是这样的吗?
df = data.frame(State = c(
32L, 28L, 43L, 5L, 23L, 34L,
30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
19L, 41L, 50L, 2L, 45L
), Label = c(
"alabama", "alaska", "arizona",
"arkansas", "california", "colorado", "connecticut", "delaware",
"florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
"iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
"massachusetts", "michigan", "minnesota", "mississippi", "missouri",
"montana", "nebraska", "nevada", "new hampshire", "new jersey",
"new mexico", "new york", "north carolina", "north dakota", "ohio",
"oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
"south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
"washington", "west virginia", "wisconsin", "wyoming"
))
head(df)
State Label
1 32 alabama
2 28 alaska
3 43 arizona
4 5 arkansas
5 23 california
6 34 colorado
apply(df, 1, function(x) gGeoCode(x))
或者,
mapply(FUN = gGeoCode, df$Label, SIMPLIFY = T)
注意:一些州仍然抛出 NA
。重新运行的代码取缺失的坐标。但是,如果我们知道您的输入格式/数据帧结构,我希望它能更有效地工作。此外,重要的是要确保您传递的参数是 gGeoCode
所期望的。
我知道这个问题主要是关于 *apply
,但是,如果您只是在地理编码之后,一个更简单的选择是使用矢量化函数,例如 ggmap::geocode
state_lat_long <- structure(
list(State = structure(
c(
32L, 28L, 43L, 5L, 23L, 34L,
30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
19L, 41L, 50L, 2L, 45L
), .Label = c(
"alabama", "alaska", "arizona",
"arkansas", "california", "colorado", "connecticut", "delaware",
"florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
"iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
"massachusetts", "michigan", "minnesota", "mississippi", "missouri",
"montana", "nebraska", "nevada", "new hampshire", "new jersey",
"new mexico", "new york", "north carolina", "north dakota", "ohio",
"oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
"south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
"washington", "west virginia", "wisconsin", "wyoming"
), class = "factor"
)), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)
library(ggmap)
## to make sure we're using the correct geocode function I call it with 'ggmap::geocode'
ggmap::geocode(as.character(state_lat_long$State))
...
# lon lat
# 1 -74.00594 40.71278
# 2 -116.41939 38.80261
# 3 -99.90181 31.96860
# 4 -119.41793 36.77826
# 5 -94.68590 46.72955
# 6 -101.00201 47.55149
编辑 感谢@user5249203 指出地理编码最好通过 ggmaps 的地理编码调用完成。不过要注意NA。
我正在与 R 中的 apply
家庭作斗争。
我正在使用一个 function 接受一个字符串和 returns 经度和纬度
> gGeoCode("Philadelphia, PA")
[1] 39.95258 -75.16522
我有一个简单的数据框,其中包含所有 52 个州的名称:
dput(state_lat_long)
structure(
list(State = structure(
c(
32L, 28L, 43L, 5L, 23L, 34L,
30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
19L, 41L, 50L, 2L, 45L
), .Label = c(
"alabama", "alaska", "arizona",
"arkansas", "california", "colorado", "connecticut", "delaware",
"florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
"iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
"massachusetts", "michigan", "minnesota", "mississippi", "missouri",
"montana", "nebraska", "nevada", "new hampshire", "new jersey",
"new mexico", "new york", "north carolina", "north dakota", "ohio",
"oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
"south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
"washington", "west virginia", "wisconsin", "wyoming"
), class = "factor"
)), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)
为了练习我的 apply
技能,我只想将 gGeoCode
应用于 state_lat_long
数据框唯一列中的每个单元格。
再简单不过了。
那这有什么问题呢?
> View(apply(state_lat_long, function(x) gGeoCode(x)))
当我 运行 这个时,我得到:
Error in View : argument "FUN" is missing, with no default
我不明白,因为 FUN
没有丢失。
所以,让我们试试 sapply
。应该很简单吧?
但这有什么问题呢?
View(sapply(state_lat_long$State, function(x) gGeoCode(x)))
当我 运行 这样做时,我得到 2 行 50 列,其中包含 NA
。我无法理解它。
接下来,我尝试了
View(apply(state_lat_long, 2, function(x) gGeoCode(x)))
我得到了
State
40.71278
-74.00594
同样,这毫无意义!
我做错了什么?谢谢。
你的数据框是这样的吗?
df = data.frame(State = c(
32L, 28L, 43L, 5L, 23L, 34L,
30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
19L, 41L, 50L, 2L, 45L
), Label = c(
"alabama", "alaska", "arizona",
"arkansas", "california", "colorado", "connecticut", "delaware",
"florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
"iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
"massachusetts", "michigan", "minnesota", "mississippi", "missouri",
"montana", "nebraska", "nevada", "new hampshire", "new jersey",
"new mexico", "new york", "north carolina", "north dakota", "ohio",
"oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
"south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
"washington", "west virginia", "wisconsin", "wyoming"
))
head(df)
State Label
1 32 alabama
2 28 alaska
3 43 arizona
4 5 arkansas
5 23 california
6 34 colorado
apply(df, 1, function(x) gGeoCode(x))
或者,
mapply(FUN = gGeoCode, df$Label, SIMPLIFY = T)
注意:一些州仍然抛出 NA
。重新运行的代码取缺失的坐标。但是,如果我们知道您的输入格式/数据帧结构,我希望它能更有效地工作。此外,重要的是要确保您传递的参数是 gGeoCode
所期望的。
我知道这个问题主要是关于 *apply
,但是,如果您只是在地理编码之后,一个更简单的选择是使用矢量化函数,例如 ggmap::geocode
state_lat_long <- structure(
list(State = structure(
c(
32L, 28L, 43L, 5L, 23L, 34L,
30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
19L, 41L, 50L, 2L, 45L
), .Label = c(
"alabama", "alaska", "arizona",
"arkansas", "california", "colorado", "connecticut", "delaware",
"florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
"iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
"massachusetts", "michigan", "minnesota", "mississippi", "missouri",
"montana", "nebraska", "nevada", "new hampshire", "new jersey",
"new mexico", "new york", "north carolina", "north dakota", "ohio",
"oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
"south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
"washington", "west virginia", "wisconsin", "wyoming"
), class = "factor"
)), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)
library(ggmap)
## to make sure we're using the correct geocode function I call it with 'ggmap::geocode'
ggmap::geocode(as.character(state_lat_long$State))
...
# lon lat
# 1 -74.00594 40.71278
# 2 -116.41939 38.80261
# 3 -99.90181 31.96860
# 4 -119.41793 36.77826
# 5 -94.68590 46.72955
# 6 -101.00201 47.55149