使用正则表达式删除单词
remove words using regular expression
我正在尝试创建一个函数来接收推文并对其进行处理以进行情绪分析。
tweet <- "Here’s a list of all of the exchanges #safemoon is affiliated with going into the AMAs today!\nIf you look at the roadmap- \nthey are planning to add more!\nWe’re still early and JUST getting started!\nCredit to @_tokendad \n#safemooncommunity #safemoonarmy #crypto #cryptotwitter"
我想删除所有超链接、主题标签(以#开头)和提及(以@开头)的词
我解决了去除超链接功能。如何使用正则表达式查找主题标签和提及并将它们从推文中删除
process_tweet(tweet){
tweet <- gsub('http\S+', '', tweet) # get rid of hyperlinks
tweet <- gsub(, '', tweet) # how do I look for words that start with @ or # and remove them
return(tweet)
}
你可以使用-
trimws(gsub('http\S+|#\w+|@\w+', '', tweet))
这将删除超链接 (http\S+
)、主题标签 (#\w+
) 和提及 (@\w+
)。
我正在尝试创建一个函数来接收推文并对其进行处理以进行情绪分析。
tweet <- "Here’s a list of all of the exchanges #safemoon is affiliated with going into the AMAs today!\nIf you look at the roadmap- \nthey are planning to add more!\nWe’re still early and JUST getting started!\nCredit to @_tokendad \n#safemooncommunity #safemoonarmy #crypto #cryptotwitter"
我想删除所有超链接、主题标签(以#开头)和提及(以@开头)的词
我解决了去除超链接功能。如何使用正则表达式查找主题标签和提及并将它们从推文中删除
process_tweet(tweet){
tweet <- gsub('http\S+', '', tweet) # get rid of hyperlinks
tweet <- gsub(, '', tweet) # how do I look for words that start with @ or # and remove them
return(tweet)
}
你可以使用-
trimws(gsub('http\S+|#\w+|@\w+', '', tweet))
这将删除超链接 (http\S+
)、主题标签 (#\w+
) 和提及 (@\w+
)。