当我使用带有粘贴的 grepl 时出现错误。无效的正则表达式
I get an error when I use grepl with paste. Invalid regular expression
我正在使用以下代码来匹配两列的元素,
test = articles[apply(articles, 1, function(i) any(grepl(paste(dictionary, collapse = "|"), i))),]
并面临以下错误:
Error in grepl(paste(dictionary, collapse = "|"), i) :
invalid regular expression '3 M SYNDROME|3-M SYNDROME|3-M SYNDROME 1|3M SYNDROME|DOLICHOSPONDYLIC DYSPLASIA|GLOOMY FACE SYNDROME|LE MERRER SYNDROME|THREE M SYNDROME|YAKUT SHORT STATURE SYNDROME|ABDOMINAL AORTIC ANEURYSM|ANEURYSM ABDOMINAL AORTIC|AORTIC ANEURYSM ABDOMINAL|AORTIC ANEURYSM FAMILIAL ABDOMINAL 1|ABSENCE EPILEPSY|ABSENCE SEIZURE|CHILDHOOD ABSENCE EPILEPSY|JUVENILE ABSENCE EPILEPSY|PETIT MAL SEIZURE|PYKNOLEPSY|ACANTHAMOEBA INFECTION|
词典由疾病名称和同义词组成,:
[1] "3 M SYNDROME"
[2] "3-M SYNDROME"
[3] "3-M SYNDROME 1"
[4] "3M SYNDROME"
[5] "DOLICHOSPONDYLIC DYSPLASIA"
[6] "GLOOMY FACE SYNDROME"
[7] "LE MERRER SYNDROME"
[8] "THREE M SYNDROME"
[9] "YAKUT SHORT STATURE SYNDROME"
[10] "ABDOMINAL AORTIC ANEURYSM"
[11] "ANEURYSM ABDOMINAL AORTIC"
[12] "AORTIC ANEURYSM ABDOMINAL"
[13] "AORTIC ANEURYSM FAMILIAL ABDOMINAL 1"
[14] "ABSENCE EPILEPSY"
[15] "ABSENCE SEIZURE"
[16] "CHILDHOOD ABSENCE EPILEPSY"
[17] "JUVENILE ABSENCE EPILEPSY"
[18] "PETIT MAL SEIZURE"
[19] "PYKNOLEPSY"
[20] "ACANTHAMOEBA INFECTION"
[21] "ACANTHAMOEBA INFECTIONS"
[22] "ACANTHAMOEBA KERATITIS"
[23] "ACCOMMODATIVE SPASM"
这里的article是一个数据框,由各种文章组成。
字典是我要匹配的短语列表。
请帮我看看哪里出错了?
您可以使用 %in%
而不是 grepl
,即
test = articles[apply(articles, 1, function(i) any(i %in% dictionary)),]
这会检查您的字典与 articles
中的条目是否完全匹配。如果您正在寻找子字符串匹配,您应该使用 stringr
包中的 str_detect
函数,将字典条目标记为 fixed
,这样它们就不会被误解为正则表达式:
test = articles[apply(articles, 1, function(i) any(stringr::str_detect(i, fixed(dictionary)))),]
我正在使用以下代码来匹配两列的元素,
test = articles[apply(articles, 1, function(i) any(grepl(paste(dictionary, collapse = "|"), i))),]
并面临以下错误:
Error in grepl(paste(dictionary, collapse = "|"), i) :
invalid regular expression '3 M SYNDROME|3-M SYNDROME|3-M SYNDROME 1|3M SYNDROME|DOLICHOSPONDYLIC DYSPLASIA|GLOOMY FACE SYNDROME|LE MERRER SYNDROME|THREE M SYNDROME|YAKUT SHORT STATURE SYNDROME|ABDOMINAL AORTIC ANEURYSM|ANEURYSM ABDOMINAL AORTIC|AORTIC ANEURYSM ABDOMINAL|AORTIC ANEURYSM FAMILIAL ABDOMINAL 1|ABSENCE EPILEPSY|ABSENCE SEIZURE|CHILDHOOD ABSENCE EPILEPSY|JUVENILE ABSENCE EPILEPSY|PETIT MAL SEIZURE|PYKNOLEPSY|ACANTHAMOEBA INFECTION|
词典由疾病名称和同义词组成,:
[1] "3 M SYNDROME"
[2] "3-M SYNDROME"
[3] "3-M SYNDROME 1"
[4] "3M SYNDROME"
[5] "DOLICHOSPONDYLIC DYSPLASIA"
[6] "GLOOMY FACE SYNDROME"
[7] "LE MERRER SYNDROME"
[8] "THREE M SYNDROME"
[9] "YAKUT SHORT STATURE SYNDROME"
[10] "ABDOMINAL AORTIC ANEURYSM"
[11] "ANEURYSM ABDOMINAL AORTIC"
[12] "AORTIC ANEURYSM ABDOMINAL"
[13] "AORTIC ANEURYSM FAMILIAL ABDOMINAL 1"
[14] "ABSENCE EPILEPSY"
[15] "ABSENCE SEIZURE"
[16] "CHILDHOOD ABSENCE EPILEPSY"
[17] "JUVENILE ABSENCE EPILEPSY"
[18] "PETIT MAL SEIZURE"
[19] "PYKNOLEPSY"
[20] "ACANTHAMOEBA INFECTION"
[21] "ACANTHAMOEBA INFECTIONS"
[22] "ACANTHAMOEBA KERATITIS"
[23] "ACCOMMODATIVE SPASM"
这里的article是一个数据框,由各种文章组成。 字典是我要匹配的短语列表。 请帮我看看哪里出错了?
您可以使用 %in%
而不是 grepl
,即
test = articles[apply(articles, 1, function(i) any(i %in% dictionary)),]
这会检查您的字典与 articles
中的条目是否完全匹配。如果您正在寻找子字符串匹配,您应该使用 stringr
包中的 str_detect
函数,将字典条目标记为 fixed
,这样它们就不会被误解为正则表达式:
test = articles[apply(articles, 1, function(i) any(stringr::str_detect(i, fixed(dictionary)))),]