识别评论中的提及并填充数据框
Identifying mention on a comment and populating data frame
我正在尝试从数据框 twitter 中获取关于推特数据的提及,例如@someone @somebody,并创建一个新的数据框,其中包含发推的人和他们提到的人的信息。
示例:
tweets <- data.frame(user=c("people","person","ghost"),text = c("Hey, check this out
@somebody @someone","love this @john","amazing"))
此数据框的结果:
**user text**
*people Hey, check this out @somebody @someone*
*person love this @john*
*ghost amazing*
期望的结果是:
**id mention**
*people @somebody*
*people @someone*
*person john*
*ghost*
你们能帮帮我吗?
你可以使用库 stringr
:
library(stringr)
tweets$mention <- str_extract_all(tweets$text, '\@\S+')
输出如下:
tweets
user text mention
1 people Hey, check this out \n@somebody @someone @somebody, @someone
2 person love this @john @john
3 ghost amazing
要获得长格式的输出,您可以这样做:
library(dplyr)
library(tidyr)
tweets <- rbind(filter(tweets, !grepl('\@', mention)), unnest(tweets))
tweets <- tweets[, -2]
输出如下:
user mention
1 ghost
2 people @somebody
3 people @someone
4 person @john
我正在尝试从数据框 twitter 中获取关于推特数据的提及,例如@someone @somebody,并创建一个新的数据框,其中包含发推的人和他们提到的人的信息。
示例:
tweets <- data.frame(user=c("people","person","ghost"),text = c("Hey, check this out
@somebody @someone","love this @john","amazing"))
此数据框的结果:
**user text**
*people Hey, check this out @somebody @someone*
*person love this @john*
*ghost amazing*
期望的结果是:
**id mention**
*people @somebody*
*people @someone*
*person john*
*ghost*
你们能帮帮我吗?
你可以使用库 stringr
:
library(stringr)
tweets$mention <- str_extract_all(tweets$text, '\@\S+')
输出如下:
tweets
user text mention
1 people Hey, check this out \n@somebody @someone @somebody, @someone
2 person love this @john @john
3 ghost amazing
要获得长格式的输出,您可以这样做:
library(dplyr)
library(tidyr)
tweets <- rbind(filter(tweets, !grepl('\@', mention)), unnest(tweets))
tweets <- tweets[, -2]
输出如下:
user mention
1 ghost
2 people @somebody
3 people @someone
4 person @john