如果在 dataframe$text 中找到了一些单词,想要保留它们
Searching of a number of words if found in dataframe$text , want to keep them
我想从 df$text 中搜索一些词,如果有的话或者如果这些词出现在推文中,我想将整行放在新的数据框中。实际上问题发生了我已经搜索关键字 "pat"、"ppp"、"jui"、"jip" 但我得到的数据集包含具有这些关键字的用户名,但不包含推文。我想删除那些没有关键字的推文。
数据框如下所示:
screen_name | text
1| pat_bing | RT @timkaine: 22 school shootings in 2018. 3 in the last week. How many times must our hearts break hearing news like this - this time in…
2| artguroo | RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir’s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl…
3| ppp_007 | RT @atDavidHoffman: Before today’s shooting in Santa Fe, Texas, no one was talking about the NRA & gun control anymore. Except the Parkland…
4| jip_1 | RT @TravisAllen02: What do Republicans care more about?
5| esha_jip | I want jip to become the best party ever #jip #ppp #anp #pmln #pti
所需的 df 应如下所示:
screen_name | text
2| artguroo | RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir’s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl…
5| esha_jip | I want jip to become the best party ever #jip #ppp #anp #pmln #pti
我已经提取完推文,只是想收拾一下这个烂摊子。求助!
您可以使用 grep
和正则表达式来获取它。由于您包括第 2 行,我假设您想忽略大小写。
grep("pat|ppp|jui|jip", dat$text, ignore.case =TRUE)
[1] 2 5
dat[grep("pat|ppp|jui|jip", dat$text, ignore.case =TRUE), ]
screen_name
2 artguroo
5 esha_jip
text
2 RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir<U+0092>s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl<U+0085>
5
我想从 df$text 中搜索一些词,如果有的话或者如果这些词出现在推文中,我想将整行放在新的数据框中。实际上问题发生了我已经搜索关键字 "pat"、"ppp"、"jui"、"jip" 但我得到的数据集包含具有这些关键字的用户名,但不包含推文。我想删除那些没有关键字的推文。 数据框如下所示:
screen_name | text
1| pat_bing | RT @timkaine: 22 school shootings in 2018. 3 in the last week. How many times must our hearts break hearing news like this - this time in…
2| artguroo | RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir’s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl…
3| ppp_007 | RT @atDavidHoffman: Before today’s shooting in Santa Fe, Texas, no one was talking about the NRA & gun control anymore. Except the Parkland…
4| jip_1 | RT @TravisAllen02: What do Republicans care more about?
5| esha_jip | I want jip to become the best party ever #jip #ppp #anp #pmln #pti
所需的 df 应如下所示:
screen_name | text
2| artguroo | RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir’s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl…
5| esha_jip | I want jip to become the best party ever #jip #ppp #anp #pmln #pti
我已经提取完推文,只是想收拾一下这个烂摊子。求助!
您可以使用 grep
和正则表达式来获取它。由于您包括第 2 行,我假设您想忽略大小写。
grep("pat|ppp|jui|jip", dat$text, ignore.case =TRUE)
[1] 2 5
dat[grep("pat|ppp|jui|jip", dat$text, ignore.case =TRUE), ]
screen_name
2 artguroo
5 esha_jip
text
2 RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir<U+0092>s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl<U+0085>
5