将自定义函数应用于现有列以在 R 中的数据框中创建新列的最佳方法

Question

我有一个数据框，其字符类型列包含以逗号分隔的数字字符串，即 1, 2, 3, 4 。我有一个自定义函数，我想将其应用于列中的每个值行，以便获得一个新值，我可以将其存储到数据框的新列中 df.

初始数据帧

A B str
1 1 1, 2, 5
1 2 NA
2 1 NA
2 2 1, 3

最终数据帧

A B str      res
1 1 1, 2, 5  2
1 2 NA       0
2 1 NA       0
2 2 1, 3     1

这是我的自定义函数 getCounts

getCounts <- function(str, x, y){
  if (is.na(str)){
    return(as.integer(0))
  }
  vec <- as.integer(unlist(strsplit(str, ',')))
  count <- 0
  for (i in vec) {
    if (i >= x & i <= y){
      count <- count + 1
    }
  }
  return(as.integer(count))
}

我最初尝试使用 lapply，因为根据其他帖子，它似乎最适合，但一直出现错误，例如：

df <- df %>% mutate(res = lapply(df$str, getCounts(df$str, 0, 2)))

Error: Problem with `mutate()` input `res`. x missing value where TRUE/FALSE needed i Input `res` is `lapply(df$str, getCounts(df$str, 0, 2))`

唯一似乎有效的是当我使用 mapply 时，但我真的不明白为什么以及是否有更好的方法来做到这一点。

df <- df %>%mutate(res = mapply(getCounts, df$str, 0, 2))

Answer 1

如果我没看错，你应该可以使用 rowwise():

df %>%
  rowwise() %>%
  mutate(res = getCounts(str, 0, 2)) %>%
  ungroup()

你的数据：

data.frame(
    A = c(1,1,2,2),
    B = c(1,2,1,2),
    str = c('1, 2, 5', NA, NA, '1, 3')
) -> df

getCounts <- function(str, x, y){
    if (is.na(str)){
        return(as.integer(0))
    }
    vec <- as.integer(unlist(strsplit(str, ',')))
    count <- 0
    for (i in vec) {
        if (i >= x & i <= y){
            count <- count + 1
        }
    }
    return(as.integer(count))
}

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df %>%
    rowwise() %>%
    mutate(res = getCounts(str, 0, 2)) %>%
    ungroup()
#> # A tibble: 4 x 4
#>       A     B str       res
#>   <dbl> <dbl> <chr>   <int>
#> 1     1     1 1, 2, 5     2
#> 2     1     2 <NA>        0
#> 3     2     1 <NA>        0
#> 4     2     2 1, 3        1

^{由 reprex package (v1.0.0)}

于 2021 年 3 月 17 日创建

Answer 2

你可以试试Vectorize

df %>%
  mutate(res = Vectorize(getCounts)(str, 0, 2))

或sapply

df %>%
  mutate(res = sapply(str, getCounts, x = 0, y = 2))

将自定义函数应用于现有列以在 R 中的数据框中创建新列的最佳方法

Best way to apply a custom function to existing column to create a new column in data frame in R

r

lapply

dataframe

mapply

dplyr