如果字符串包含多个“@”,则使用正则表达式删除行
Remove row if string contains more than one "@" using regular expression
我有一个包含两列的数据框。 cnn_handle
包含 Twitter 句柄,tweet
包含在相应行中提及 Twitter 句柄的推文。但是,大多数推文至少提到了另一个 user/handle,由 @
表示。我想删除一条推文包含多个 @
.
的所有行
df
cnn_handle tweet
1 @DanaBashCNN @JohnKingCNN @DanaBashCNN @kaitlancollins @eliehonig @thelauracoates @KristenhCNN CNN you are still FAKE NEWS !!!
2 @DanaBashCNN @DanaBashCNN He could have made the same calls here, from SC.
3 @DanaBashCNN @DanaBashCNN GRAMMER ALERT: THAT'S FORMER PRESIDENT TRUMP Please don't forget this important point. Also please refrain from showing a pic of him till you have one in his casket. thank you
4 @brianstelter @eliehonig @brianstelter My apologies to you sir. Just seems like that story disappeared. Imo the nursing home scandal is just as bad.
5 @brianstelter @DrAndrewBaer1 @JGreenblattADL @brianstelter @CNN @TuckerCarlson @FoxNews Anti-Semite are you, Herr Doktor? How very Mengele of you.
6 @brianstelter @ma_makosh @Shortguy1 @brianstelter @ChrisCuomo Liberals, their feelings before facts and their crucifixion of people before due process. Never a presumption of innocence when it concerns the rival party. So un-American.
7 @andersoncooper @BrendonLeslie And Biden was a staunch opponent of “forced busingâ€. He also said that integrating schools will cause a “racial jungleâ€. But u won’t hear this on @ChrisCuomo @jaketapper @Acosta @andersoncooper bc they continue to cover up the truth about Biden & his family.
8 @andersoncooper Anderson Cooper revealed that he "wanted a change" when reflecting on his break from news as #TheMole arrives on Netflix.
9 @andersoncooper @johnnydollar01 @newsbusters @drsanjaygupta @andersoncooper He was terrible as a host
我怀疑需要某种类型的正则表达式。但是,我不确定如何将它与大于号结合起来。
期望的结果,即推文仅提及相应的 cnn_handle
cnn_handle tweet
2 @DanaBashCNN @DanaBashCNN He could have made the same calls here, from SC.
3 @DanaBashCNN @DanaBashCNN GRAMMER ALERT: THAT'S FORMER PRESIDENT TRUMP Please don't forget this important point. Also please refrain from showing a pic of him till you have one in his casket. thank you
8 @andersoncooper Anderson Cooper revealed that he "wanted a change" when reflecting on his break from news as #TheMole arrives on Netflix.
假设您的数据框名为 tweets
,只需检查 @
是否有多个匹配项,后跟文本:
pattern <- "@[a-zA-Z.+]"
multiple_ats <- unlist(lapply(tweets$tweet, function(x) length(gregexpr(pattern, x)[[1]])>1))
tweets[!multiple_ats,]
输出:
# A tibble: 3 x 2
cnn_handle tweet
<chr> <chr>
1 @DanaBashCNN "@DanaBashCNN He could have made the same calls here, from SC."
2 @DanaBashCNN "@DanaBashCNN GRAMMER ALERT: THAT'S FORMER PRESIDENT TRUMP Please don't forget this important point.,Also please refrain from showing a pic of him till you have one in his casket.,thank you"
3 @andersoncooper "Anderson Cooper revealed that he \"wanted a change\" when reflecting on his break from news as #TheMole arrives on Netflix."
编辑:如果允许 Twitter 用户名以数字或特殊字符开头,您将必须更改模式。我不知道规则是什么
使用 stringr
中的 str_count
的直接解决方案,它假定 @
仅出现在 Twitter 句柄中:
base R
:
library(stringr)
df[str_count(df$tweet, "@") > 1,]
dplyr
:
library(dplyr)
library(stringr)
df %>%
filter(!str_count(tweet, "@") > 1)
我有一个包含两列的数据框。 cnn_handle
包含 Twitter 句柄,tweet
包含在相应行中提及 Twitter 句柄的推文。但是,大多数推文至少提到了另一个 user/handle,由 @
表示。我想删除一条推文包含多个 @
.
df
cnn_handle tweet
1 @DanaBashCNN @JohnKingCNN @DanaBashCNN @kaitlancollins @eliehonig @thelauracoates @KristenhCNN CNN you are still FAKE NEWS !!!
2 @DanaBashCNN @DanaBashCNN He could have made the same calls here, from SC.
3 @DanaBashCNN @DanaBashCNN GRAMMER ALERT: THAT'S FORMER PRESIDENT TRUMP Please don't forget this important point. Also please refrain from showing a pic of him till you have one in his casket. thank you
4 @brianstelter @eliehonig @brianstelter My apologies to you sir. Just seems like that story disappeared. Imo the nursing home scandal is just as bad.
5 @brianstelter @DrAndrewBaer1 @JGreenblattADL @brianstelter @CNN @TuckerCarlson @FoxNews Anti-Semite are you, Herr Doktor? How very Mengele of you.
6 @brianstelter @ma_makosh @Shortguy1 @brianstelter @ChrisCuomo Liberals, their feelings before facts and their crucifixion of people before due process. Never a presumption of innocence when it concerns the rival party. So un-American.
7 @andersoncooper @BrendonLeslie And Biden was a staunch opponent of “forced busingâ€. He also said that integrating schools will cause a “racial jungleâ€. But u won’t hear this on @ChrisCuomo @jaketapper @Acosta @andersoncooper bc they continue to cover up the truth about Biden & his family.
8 @andersoncooper Anderson Cooper revealed that he "wanted a change" when reflecting on his break from news as #TheMole arrives on Netflix.
9 @andersoncooper @johnnydollar01 @newsbusters @drsanjaygupta @andersoncooper He was terrible as a host
我怀疑需要某种类型的正则表达式。但是,我不确定如何将它与大于号结合起来。
期望的结果,即推文仅提及相应的 cnn_handle
cnn_handle tweet
2 @DanaBashCNN @DanaBashCNN He could have made the same calls here, from SC.
3 @DanaBashCNN @DanaBashCNN GRAMMER ALERT: THAT'S FORMER PRESIDENT TRUMP Please don't forget this important point. Also please refrain from showing a pic of him till you have one in his casket. thank you
8 @andersoncooper Anderson Cooper revealed that he "wanted a change" when reflecting on his break from news as #TheMole arrives on Netflix.
假设您的数据框名为 tweets
,只需检查 @
是否有多个匹配项,后跟文本:
pattern <- "@[a-zA-Z.+]"
multiple_ats <- unlist(lapply(tweets$tweet, function(x) length(gregexpr(pattern, x)[[1]])>1))
tweets[!multiple_ats,]
输出:
# A tibble: 3 x 2
cnn_handle tweet
<chr> <chr>
1 @DanaBashCNN "@DanaBashCNN He could have made the same calls here, from SC."
2 @DanaBashCNN "@DanaBashCNN GRAMMER ALERT: THAT'S FORMER PRESIDENT TRUMP Please don't forget this important point.,Also please refrain from showing a pic of him till you have one in his casket.,thank you"
3 @andersoncooper "Anderson Cooper revealed that he \"wanted a change\" when reflecting on his break from news as #TheMole arrives on Netflix."
编辑:如果允许 Twitter 用户名以数字或特殊字符开头,您将必须更改模式。我不知道规则是什么
使用 stringr
中的 str_count
的直接解决方案,它假定 @
仅出现在 Twitter 句柄中:
base R
:
library(stringr)
df[str_count(df$tweet, "@") > 1,]
dplyr
:
library(dplyr)
library(stringr)
df %>%
filter(!str_count(tweet, "@") > 1)