难以在 r 的变异步骤中使用 rle 命令来计算单词中连续字符的最大数量

Having difficulty using rle command within a mutate step in r to count the max number of consecutive characters in a word

我创建了这个函数来计算一个单词中连续字符的最大数量。

max(rle(unlist(strsplit("happy", split = "")))$lengths)

该函数适用于单个单词,但当我尝试在 mutate 步骤中使用该函数时,它不起作用。这是涉及变异步骤的代码。

text3 <- "The most pressing of those issues, considering the franchise's 
stated goal of competing for championships above all else, is an apparent 
disconnect between Lakers vice president of basketball operations and general manager"

text3_df <- tibble(line = 1:1, text3)

text3_df %>%
  unnest_tokens(word, text3) %>% 
  mutate(
    num_letters = nchar(word),
    num_vowels = get_count(word),
    num_consec_char = max(rle(unlist(strsplit(word, split = "")))$lengths)
  )

变量 num_letters 和 num_vowels 工作正常,但是 num_consec_char 的每个值我都得到 2。我不知道我做错了什么。

此命令 rle(unlist(strsplit(word, split = "")))$lengths 未向量化,因此对每一行的整个单词列表进行操作,因此每一行的结果相同。

您将需要使用某种类型的循环(即 forapplypurrr::map)来解决它。

library(dplyr)
library(tidytext)

text3 <- "The most pressing of those issues, considering the franchise's 
stated goal of competing for championships above all else, is an apparent 
disconnect between Lakers vice president of basketball operations and general manager"

text3_df <- tibble(line = 1:1, text3)

output<- text3_df %>%
   unnest_tokens(word, text3) %>% 
   mutate(
      num_letters = nchar(word),
    #  num_vowels = get_count(word),
   )

output$num_consec_char<- sapply(output$word, function(word){
   max(rle(unlist(strsplit(word, split = "")))$lengths)
})
output


# A tibble: 32 × 4
line word        num_letters num_consec_char
<int> <chr>             <int>           <int>
   1     1 the                   3               1
   2     1 most                  4               1
   3     1 pressing              8               2
   4     1 of                    2               1
   5     1 those                 5               1
   6     1 issues                6               2
   7     1 considering          11               1