如果字符串包含多个“@”,则使用正则表达式删除行

Remove row if string contains more than one "@" using regular expression

我有一个包含两列的数据框。 cnn_handle 包含 Twitter 句柄,tweet 包含在相应行中提及 Twitter 句柄的推文。但是,大多数推文至少提到了另一个 user/handle,由 @ 表示。我想删除一条推文包含多个 @.

的所有行
df
    cnn_handle      tweet
1   @DanaBashCNN    @JohnKingCNN @DanaBashCNN @kaitlancollins @eliehonig @thelauracoates @KristenhCNN CNN you are still FAKE NEWS !!!
2   @DanaBashCNN    @DanaBashCNN He could have made the same calls here, from SC.
3   @DanaBashCNN    @DanaBashCNN GRAMMER ALERT:  THAT'S FORMER PRESIDENT TRUMP Please don't forget this important point.   Also please refrain from showing a pic of him till you have one in his casket.   thank you
4   @brianstelter   @eliehonig @brianstelter My apologies to you sir. Just seems like that story disappeared. Imo the nursing home scandal is just as bad.
5   @brianstelter   @DrAndrewBaer1 @JGreenblattADL @brianstelter @CNN @TuckerCarlson @FoxNews Anti-Semite are you,  Herr Doktor? How very Mengele of you.
6   @brianstelter   @ma_makosh @Shortguy1 @brianstelter @ChrisCuomo Liberals, their feelings before facts and their crucifixion of people before due process. Never a presumption of innocence when it concerns the rival party. So un-American.
7   @andersoncooper @BrendonLeslie And Biden was a staunch opponent of “forced busingâ€. He also said that integrating schools will cause a “racial jungleâ€. But u won’t hear this on @ChrisCuomo @jaketapper @Acosta @andersoncooper bc they continue to cover up the truth about Biden & his family.
8   @andersoncooper Anderson Cooper revealed that he "wanted a change" when reflecting on his break from news as #TheMole arrives on Netflix.
9   @andersoncooper @johnnydollar01 @newsbusters @drsanjaygupta @andersoncooper He was terrible as a host

我怀疑需要某种类型的正则表达式。但是,我不确定如何将它与大于号结合起来。

期望的结果,即推文仅提及相应的 cnn_handle

cnn_handle      tweet
2   @DanaBashCNN    @DanaBashCNN He could have made the same calls here, from SC.
3   @DanaBashCNN    @DanaBashCNN GRAMMER ALERT:  THAT'S FORMER PRESIDENT TRUMP Please don't forget this important point.   Also please refrain from showing a pic of him till you have one in his casket.   thank you
8   @andersoncooper Anderson Cooper revealed that he "wanted a change" when reflecting on his break from news as #TheMole arrives on Netflix.

假设您的数据框名为 tweets,只需检查 @ 是否有多个匹配项,后跟文本:

pattern  <- "@[a-zA-Z.+]"
multiple_ats  <- unlist(lapply(tweets$tweet, function(x) length(gregexpr(pattern, x)[[1]])>1))
tweets[!multiple_ats,]

输出:

# A tibble: 3 x 2
  cnn_handle      tweet
  <chr>           <chr>
1 @DanaBashCNN    "@DanaBashCNN He could have made the same calls here, from SC."
2 @DanaBashCNN    "@DanaBashCNN GRAMMER ALERT:  THAT'S FORMER PRESIDENT TRUMP Please don't forget this important point.,Also please refrain from showing a pic of him till you have one in his casket.,thank you"
3 @andersoncooper "Anderson Cooper revealed that he \"wanted a change\" when reflecting on his break from news as #TheMole arrives on Netflix."

编辑:如果允许 Twitter 用户名以数字或特殊字符开头,您将必须更改模式。我不知道规则是什么

使用 stringr 中的 str_count 的直接解决方案,它假定 @ 仅出现在 Twitter 句柄中:

base R:

library(stringr)
df[str_count(df$tweet, "@") > 1,]

dplyr:

library(dplyr)
library(stringr)
df %>%
  filter(!str_count(tweet, "@") > 1)