当我使用带有粘贴的 grepl 时出现错误。无效的正则表达式

I get an error when I use grepl with paste. Invalid regular expression

我正在使用以下代码来匹配两列的元素,

test = articles[apply(articles, 1, function(i) any(grepl(paste(dictionary, collapse = "|"), i))),]

并面临以下错误:

Error in grepl(paste(dictionary, collapse = "|"), i) : 
  invalid regular expression '3 M SYNDROME|3-M SYNDROME|3-M SYNDROME 1|3M SYNDROME|DOLICHOSPONDYLIC DYSPLASIA|GLOOMY FACE SYNDROME|LE MERRER SYNDROME|THREE M SYNDROME|YAKUT SHORT STATURE SYNDROME|ABDOMINAL AORTIC ANEURYSM|ANEURYSM ABDOMINAL AORTIC|AORTIC ANEURYSM ABDOMINAL|AORTIC ANEURYSM FAMILIAL ABDOMINAL 1|ABSENCE EPILEPSY|ABSENCE SEIZURE|CHILDHOOD ABSENCE EPILEPSY|JUVENILE ABSENCE EPILEPSY|PETIT MAL SEIZURE|PYKNOLEPSY|ACANTHAMOEBA INFECTION|

词典由疾病名称和同义词组成,:

   [1] "3 M SYNDROME"                                                                
   [2] "3-M SYNDROME"                                                                
   [3] "3-M SYNDROME 1"                                                              
   [4] "3M SYNDROME"                                                                 
   [5] "DOLICHOSPONDYLIC DYSPLASIA"                                                  
   [6] "GLOOMY FACE SYNDROME"                                                        
   [7] "LE MERRER SYNDROME"                                                          
   [8] "THREE M SYNDROME"                                                            
   [9] "YAKUT SHORT STATURE SYNDROME"                                                
  [10] "ABDOMINAL AORTIC ANEURYSM"                                                   
  [11] "ANEURYSM ABDOMINAL AORTIC"                                                   
  [12] "AORTIC ANEURYSM ABDOMINAL"                                                   
  [13] "AORTIC ANEURYSM FAMILIAL ABDOMINAL 1"                                        
  [14] "ABSENCE EPILEPSY"                                                            
  [15] "ABSENCE SEIZURE"                                                             
  [16] "CHILDHOOD ABSENCE EPILEPSY"                                                  
  [17] "JUVENILE ABSENCE EPILEPSY"                                                   
  [18] "PETIT MAL SEIZURE"                                                           
  [19] "PYKNOLEPSY"                                                                  
  [20] "ACANTHAMOEBA INFECTION"                                                      
  [21] "ACANTHAMOEBA INFECTIONS"                                                     
  [22] "ACANTHAMOEBA KERATITIS"                                                      
  [23] "ACCOMMODATIVE SPASM"

这里的article是一个数据框,由各种文章组成。 字典是我要匹配的短语列表。 请帮我看看哪里出错了?

您可以使用 %in% 而不是 grepl,即

test = articles[apply(articles, 1, function(i) any(i %in% dictionary)),]

这会检查您的字典与 articles 中的条目是否完全匹配。如果您正在寻找子字符串匹配,您应该使用 stringr 包中的 str_detect 函数,将字典条目标记为 fixed ,这样它们就不会被误解为正则表达式:

test = articles[apply(articles, 1, function(i) any(stringr::str_detect(i, fixed(dictionary)))),]