返回每个数据框行的 Twitter 句柄

Question

给定以下数据框：

df <- as.data.frame(c("Testing @cspenn @test @hi","this is a tweet","this is a tweet with @mention of @twitter"))
names(df)[1] <- "content"

我正在尝试提取每行的单个 Twitter 句柄，而不是一次提取所有。

从 this example 开始，我有这个函数可以将它们全部吐出，但我需要它们保持包含在每一行中。

df$handles <- plyr::ddply(df, c("content"), function(x){
    mention <- unlist(stringr::str_extract_all(x$content, "@\w+"))
    # some tweets do not contain mentions, making this necessary:
    if (length(mention) > 0){
        return(data.frame(mention = mention))
    } else {
        return(data.frame(mention = NA))    
    }
})

如何只提取每行的句柄，而不是一次提取所有句柄？

Answer 1

library(tidyverse)

df %>%
  mutate(mentions = str_extract_all(content, "@\w+"))

输出：

                                    content            mentions
1                 Testing @cspenn @test @hi @cspenn, @test, @hi
2                           this is a tweet                    
3 this is a tweet with @mention of @twitter  @mention, @twitter

Answer 2

你可以这样做。

xy <- stringr::str_extract_all(df$content, "@\w+")
xy <- sapply(xy, FUN = paste, collapse = ", ")  # have all names concatenated
cbind(df, xy)

                                    content                  xy
1                 Testing @cspenn @test @hi @cspenn, @test, @hi
2                           this is a tweet                    
3 this is a tweet with @mention of @twitter  @mention, @twitter

返回每个数据框行的 Twitter 句柄

Returning Twitter handles per dataframe row

twitter

r

stringr

rtweet