R命令提取两个包含大括号的字符串之间的文本
R command to extract text between two strings containing curly parentheses
我正在尝试使用 stringr 库中的 R str_match 函数来提取书目条目中的标题,如下所示。实际上,我需要提取
之间的文本
"title={" and the "},"
个字符串。
a2
[1] "@article{2020, title={Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR‐421 and E‐cadherin}, volume={9}, ISSN={2045-7634}, url={http://dx.doi.org/10.1002/cam4.3002}, DOI={10.1002/cam4.3002}, number={11}, journal={Cancer Medicine}, publisher={Wiley}, author={Ji, Yefeng and Feng, Guanying and Hou, Yunwen and Yu, Yang and Wang, Ruixia and Yuan, Hua}, year={2020}, month={Apr}, pages={3954–3963} }"
我使用了如下方法,但收到错误消息:
str_match(a2, "(?s)title={\s*(.*?)\s*},.")
Error in stri_match_first_regex(string, pattern, opts_regex = opts(pattern)) :
Error in {min,max} interval. (U_REGEX_BAD_INTERVAL, context=(?s)title={\s*(.*?)\s*},.
)
我猜是大括号匹配的问题,但我没有取得任何进展。任何指针将不胜感激。
使用以下正则表达式。
a2 <- "@article{2020, title={Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin}, volume={9}, ISSN={2045-7634}, url={http://dx.doi.org/10.1002/cam4.3002}, DOI={10.1002/cam4.3002}, number={11}, journal={Cancer Medicine}, publisher={Wiley}, author={Ji, Yefeng and Feng, Guanying and Hou, Yunwen and Yu, Yang and Wang, Ruixia and Yuan, Hua}, year={2020}, month={Apr}, pages={3954–3963} }"
sub("^.*title=\{([^{}]+)\}.*$", "\1", a2)
#> [1] "Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin"
由 reprex package (v2.0.1)
创建于 2022-03-19
编辑
另一种stringr
方式。
stringr::str_match(a2, "^.*title=\{([^{}]+)\}.*$")[,2]
#> [1] "Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin"
由 reprex package (v2.0.1)
创建于 2022-03-19
另一种可能的解决方案,基于stringr::str_extract
:
library(tidyverse)
a2 %>%
str_extract("(?<=title\=\{)[^\}]*(?=\},)")
#> [1] "Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR‐421 and E‐cadherin"
既然你想解析一个 bibtex 文件,你可以做的是使用 bib2df::bib2df
,其中 reference.bib
是你的 bibtex 文件。
install.packages("bib2df")
library(bib2df)
bib2df("reference.bib")$TITLE..LONG
# [1] "Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin"
我正在尝试使用 stringr 库中的 R str_match 函数来提取书目条目中的标题,如下所示。实际上,我需要提取
之间的文本
"title={" and the "},"
个字符串。
a2
[1] "@article{2020, title={Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR‐421 and E‐cadherin}, volume={9}, ISSN={2045-7634}, url={http://dx.doi.org/10.1002/cam4.3002}, DOI={10.1002/cam4.3002}, number={11}, journal={Cancer Medicine}, publisher={Wiley}, author={Ji, Yefeng and Feng, Guanying and Hou, Yunwen and Yu, Yang and Wang, Ruixia and Yuan, Hua}, year={2020}, month={Apr}, pages={3954–3963} }"
我使用了如下方法,但收到错误消息:
str_match(a2, "(?s)title={\s*(.*?)\s*},.")
Error in stri_match_first_regex(string, pattern, opts_regex = opts(pattern)) :
Error in {min,max} interval. (U_REGEX_BAD_INTERVAL, context=(?s)title={\s*(.*?)\s*},.
)
我猜是大括号匹配的问题,但我没有取得任何进展。任何指针将不胜感激。
使用以下正则表达式。
a2 <- "@article{2020, title={Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin}, volume={9}, ISSN={2045-7634}, url={http://dx.doi.org/10.1002/cam4.3002}, DOI={10.1002/cam4.3002}, number={11}, journal={Cancer Medicine}, publisher={Wiley}, author={Ji, Yefeng and Feng, Guanying and Hou, Yunwen and Yu, Yang and Wang, Ruixia and Yuan, Hua}, year={2020}, month={Apr}, pages={3954–3963} }"
sub("^.*title=\{([^{}]+)\}.*$", "\1", a2)
#> [1] "Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin"
由 reprex package (v2.0.1)
创建于 2022-03-19编辑
另一种stringr
方式。
stringr::str_match(a2, "^.*title=\{([^{}]+)\}.*$")[,2]
#> [1] "Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin"
由 reprex package (v2.0.1)
创建于 2022-03-19另一种可能的解决方案,基于stringr::str_extract
:
library(tidyverse)
a2 %>%
str_extract("(?<=title\=\{)[^\}]*(?=\},)")
#> [1] "Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR‐421 and E‐cadherin"
既然你想解析一个 bibtex 文件,你可以做的是使用 bib2df::bib2df
,其中 reference.bib
是你的 bibtex 文件。
install.packages("bib2df")
library(bib2df)
bib2df("reference.bib")$TITLE..LONG
# [1] "Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin"