如果在 dataframe$text 中找到了一些单词,想要保留它们

Searching of a number of words if found in dataframe$text , want to keep them

我想从 df$text 中搜索一些词,如果有的话或者如果这些词出现在推文中,我想将整行放在新的数据框中。实际上问题发生了我已经搜索关键字 "pat"、"ppp"、"jui"、"jip" 但我得到的数据集包含具有这些关键字的用户名,但不包含推文。我想删除那些没有关键字的推文。 数据框如下所示:

     screen_name  |   text
1|   pat_bing     | RT @timkaine: 22 school shootings in 2018. 3 in the last week. How many times must our hearts break hearing news like this - this time in…

2|   artguroo     | RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir’s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl…

3|   ppp_007      | RT @atDavidHoffman: Before today’s shooting in Santa Fe, Texas, no one was talking about the NRA & gun control anymore. Except the Parkland…

4|   jip_1        | RT @TravisAllen02: What do Republicans care more about?

5|   esha_jip     | I want jip to become the best party ever #jip #ppp #anp #pmln #pti

所需的 df 应如下所示:

  screen_name  |   text

2|   artguroo | RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir’s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl…

5|   esha_jip | I want jip to become the best party ever #jip #ppp #anp #pmln #pti

我已经提取完推文,只是想收拾一下这个烂摊子。求助!

您可以使用 grep 和正则表达式来获取它。由于您包括第 2 行,我假设您想忽略大小写。

grep("pat|ppp|jui|jip", dat$text, ignore.case =TRUE)
[1] 2 5
dat[grep("pat|ppp|jui|jip", dat$text, ignore.case =TRUE), ]
  screen_name
2    artguroo
5    esha_jip
                                                                                                                                                        text
2 RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir<U+0092>s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl<U+0085>
5