在 dplyr mutate 中传递单列

Question

我正在尝试使用 stringr 和 dplyr 来提取元音周围的字符。当我尝试下面的代码时，str_match 函数抛出错误：

Error in mutate_impl(.data, dots) : 
  Column `near_vowel` must be length 150 (the number of rows) or one, not 450

最小示例代码：

library(tidyverse)
library(magrittr)
library(stringr)
iris %>%
  select(Species) %>%
  mutate(name_length = str_length(Species),
         near_vowel = str_match(Species, "(.)[aeiou](.)"))

我期望，例如"virginica", 它将提取 "vir", "gin", "nic".

Answer 1

您需要解决一些事情，但是，我将根据您在问题中提供的内容提出整洁方法。

主要问题是您为 near_vowel 每行返回多个值，我们可以通过嵌套结果来解决这个问题。其次，您需要 rowwise 处理以使您的变异变得合理......第三（如@Psidom 所述）您的 regex 不会产生您想要的输出。解决前两个问题，这是您问题的核心...

library(dplyr)
library(stringr)

df <- iris %>%
  select(Species) %>%
  mutate(
    name_length = str_length(Species),
    near_vowel = str_extract_all(Species, "[^aeiou][aeiou][^aeiou]")
  )

head(df)

#   Species name_length near_vowel
# 1  setosa           6        set
# 2  setosa           6        set
# 3  setosa           6        set
# 4  setosa           6        set
# 5  setosa           6        set
# 6  setosa           6        set

head(df[df$Species == "virginica", ]$near_vowel)

# [[1]]
# [1] "vir" "gin"
# 
# [[2]]
# [1] "vir" "gin"
# 
# [[3]]
# [1] "vir" "gin"
# 
# [[4]]
# [1] "vir" "gin"
# 
# [[5]]
# [1] "vir" "gin"
# 
# [[6]]
# [1] "vir" "gin"

Edit: Updated with str_extract_all approach offered by @neilfws, this has the added benefit of being able to drop the rowwise operation.

在 dplyr mutate 中传递单列

Passing a single column in dplyr mutate

r

stringr

dplyr

tidyverse