返回每个数据框行的 Twitter 句柄
Returning Twitter handles per dataframe row
给定以下数据框:
df <- as.data.frame(c("Testing @cspenn @test @hi","this is a tweet","this is a tweet with @mention of @twitter"))
names(df)[1] <- "content"
我正在尝试提取每行的单个 Twitter 句柄,而不是一次提取所有。
从 this example 开始,我有这个函数可以将它们全部吐出,但我需要它们保持包含在每一行中。
df$handles <- plyr::ddply(df, c("content"), function(x){
mention <- unlist(stringr::str_extract_all(x$content, "@\w+"))
# some tweets do not contain mentions, making this necessary:
if (length(mention) > 0){
return(data.frame(mention = mention))
} else {
return(data.frame(mention = NA))
}
})
如何只提取每行的句柄,而不是一次提取所有句柄?
library(tidyverse)
df %>%
mutate(mentions = str_extract_all(content, "@\w+"))
输出:
content mentions
1 Testing @cspenn @test @hi @cspenn, @test, @hi
2 this is a tweet
3 this is a tweet with @mention of @twitter @mention, @twitter
你可以这样做。
xy <- stringr::str_extract_all(df$content, "@\w+")
xy <- sapply(xy, FUN = paste, collapse = ", ") # have all names concatenated
cbind(df, xy)
content xy
1 Testing @cspenn @test @hi @cspenn, @test, @hi
2 this is a tweet
3 this is a tweet with @mention of @twitter @mention, @twitter
给定以下数据框:
df <- as.data.frame(c("Testing @cspenn @test @hi","this is a tweet","this is a tweet with @mention of @twitter"))
names(df)[1] <- "content"
我正在尝试提取每行的单个 Twitter 句柄,而不是一次提取所有。
从 this example 开始,我有这个函数可以将它们全部吐出,但我需要它们保持包含在每一行中。
df$handles <- plyr::ddply(df, c("content"), function(x){
mention <- unlist(stringr::str_extract_all(x$content, "@\w+"))
# some tweets do not contain mentions, making this necessary:
if (length(mention) > 0){
return(data.frame(mention = mention))
} else {
return(data.frame(mention = NA))
}
})
如何只提取每行的句柄,而不是一次提取所有句柄?
library(tidyverse)
df %>%
mutate(mentions = str_extract_all(content, "@\w+"))
输出:
content mentions
1 Testing @cspenn @test @hi @cspenn, @test, @hi
2 this is a tweet
3 this is a tweet with @mention of @twitter @mention, @twitter
你可以这样做。
xy <- stringr::str_extract_all(df$content, "@\w+")
xy <- sapply(xy, FUN = paste, collapse = ", ") # have all names concatenated
cbind(df, xy)
content xy
1 Testing @cspenn @test @hi @cspenn, @test, @hi
2 this is a tweet
3 this is a tweet with @mention of @twitter @mention, @twitter