将函数应用于 R 中数据帧的列中的每个单元格

apply a function to each cell in a column of a dataframe in R

编辑 感谢@user5249203 指出地理编码最好通过 ggmaps 的地理编码调用完成。不过要注意NA。

我正在与 R 中的 apply 家庭作斗争。

我正在使用一个 function 接受一个字符串和 returns 经度和纬度

> gGeoCode("Philadelphia, PA") [1] 39.95258 -75.16522

我有一个简单的数据框,其中包含所有 52 个州的名称:

dput(state_lat_long)
structure(
  list(State = structure(
    c(
      32L, 28L, 43L, 5L, 23L, 34L,
      30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
      18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
      17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
      19L, 41L, 50L, 2L, 45L
    ), .Label = c(
      "alabama", "alaska", "arizona",
      "arkansas", "california", "colorado", "connecticut", "delaware",
      "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
      "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
      "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
      "montana", "nebraska", "nevada", "new hampshire", "new jersey",
      "new mexico", "new york", "north carolina", "north dakota", "ohio",
      "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
      "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
      "washington", "west virginia", "wisconsin", "wyoming"
    ), class = "factor"
  )), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)

为了练习我的 apply 技能,我只想将 gGeoCode 应用于 state_lat_long 数据框唯一列中的每个单元格。

再简单不过了。

那这有什么问题呢?

> View(apply(state_lat_long, function(x) gGeoCode(x)))

当我 运行 这个时,我得到:

Error in View : argument "FUN" is missing, with no default  

我不明白,因为 FUN 没有丢失。

所以,让我们试试 sapply。应该很简单吧?

但这有什么问题呢?

View(sapply(state_lat_long$State, function(x) gGeoCode(x)))

当我 运行 这样做时,我得到 2 行 50 列,其中包含 NA。我无法理解它。

接下来,我尝试了

View(apply(state_lat_long, 2, function(x) gGeoCode(x)))  

我得到了

     State
  40.71278
 -74.00594  

同样,这毫无意义!

我做错了什么?谢谢。

你的数据框是这样的吗?

df = data.frame(State = c(
    32L, 28L, 43L, 5L, 23L, 34L,
    30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
    18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
    17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
    19L, 41L, 50L, 2L, 45L
  ), Label = c(
    "alabama", "alaska", "arizona",
    "arkansas", "california", "colorado", "connecticut", "delaware",
    "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
    "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
    "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
    "montana", "nebraska", "nevada", "new hampshire", "new jersey",
    "new mexico", "new york", "north carolina", "north dakota", "ohio",
    "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
    "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
    "washington", "west virginia", "wisconsin", "wyoming"
  ))

head(df)
  State      Label
1    32    alabama
2    28     alaska
3    43    arizona
4     5   arkansas
5    23 california
6    34   colorado

apply(df, 1, function(x) gGeoCode(x))

或者,

mapply(FUN = gGeoCode, df$Label, SIMPLIFY = T)

注意:一些州仍然抛出 NA。重新运行的代码取缺失的坐标。但是,如果我们知道您的输入格式/数据帧结构,我希望它能更有效地工作。此外,重要的是要确保您传递的参数是 gGeoCode 所期望的。

我知道这个问题主要是关于 *apply,但是,如果您只是在地理编码之后,一个更简单的选择是使用矢量化函数,例如 ggmap::geocode

state_lat_long <- structure(
    list(State = structure(
    c(
      32L, 28L, 43L, 5L, 23L, 34L,
      30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
      18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
      17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
      19L, 41L, 50L, 2L, 45L
    ), .Label = c(
      "alabama", "alaska", "arizona",
      "arkansas", "california", "colorado", "connecticut", "delaware",
      "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
      "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
      "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
      "montana", "nebraska", "nevada", "new hampshire", "new jersey",
      "new mexico", "new york", "north carolina", "north dakota", "ohio",
      "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
      "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
      "washington", "west virginia", "wisconsin", "wyoming"
    ), class = "factor"
  )), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)

library(ggmap)

## to make sure we're using the correct geocode function I call it with 'ggmap::geocode'
ggmap::geocode(as.character(state_lat_long$State))
...
#           lon      lat
# 1   -74.00594 40.71278
# 2  -116.41939 38.80261
# 3   -99.90181 31.96860
# 4  -119.41793 36.77826
# 5   -94.68590 46.72955
# 6  -101.00201 47.55149