R命令提取两个包含大括号的字符串之间的文本

Question

我正在尝试使用 stringr 库中的 R str_match 函数来提取书目条目中的标题，如下所示。实际上，我需要提取
之间的文本 "title={" and the "}," 个字符串。

a2
[1] "@article{2020, title={Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR‐421 and E‐cadherin}, volume={9}, ISSN={2045-7634}, url={http://dx.doi.org/10.1002/cam4.3002}, DOI={10.1002/cam4.3002}, number={11}, journal={Cancer Medicine}, publisher={Wiley}, author={Ji, Yefeng and Feng, Guanying and Hou, Yunwen and Yu, Yang and Wang, Ruixia and Yuan, Hua}, year={2020}, month={Apr}, pages={3954–3963} }"

我使用了如下方法，但收到错误消息：

str_match(a2, "(?s)title={\s*(.*?)\s*},.")

Error in stri_match_first_regex(string, pattern, opts_regex = opts(pattern)) :
Error in {min,max} interval. (U_REGEX_BAD_INTERVAL, context=(?s)title={\s*(.*?)\s*},.)

我猜是大括号匹配的问题，但我没有取得任何进展。任何指针将不胜感激。

Answer 1

使用以下正则表达式。

a2 <- "@article{2020, title={Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin}, volume={9}, ISSN={2045-7634}, url={http://dx.doi.org/10.1002/cam4.3002}, DOI={10.1002/cam4.3002}, number={11}, journal={Cancer Medicine}, publisher={Wiley}, author={Ji, Yefeng and Feng, Guanying and Hou, Yunwen and Yu, Yang and Wang, Ruixia and Yuan, Hua}, year={2020}, month={Apr}, pages={3954–3963} }"

sub("^.*title=\{([^{}]+)\}.*$", "\1", a2)
#> [1] "Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin"

^{由 reprex package (v2.0.1)}

创建于 2022-03-19

编辑

另一种stringr方式。

stringr::str_match(a2, "^.*title=\{([^{}]+)\}.*$")[,2]
#> [1] "Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin"

^{由 reprex package (v2.0.1)}

创建于 2022-03-19

Answer 2

另一种可能的解决方案，基于stringr::str_extract：

library(tidyverse)

a2 %>% 
  str_extract("(?<=title\=\{)[^\}]*(?=\},)")

#> [1] "Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR‐421 and E‐cadherin"

Answer 3

既然你想解析一个 bibtex 文件，你可以做的是使用 bib2df::bib2df，其中 reference.bib 是你的 bibtex 文件。

install.packages("bib2df")
library(bib2df)

bib2df("reference.bib")$TITLE..LONG
# [1] "Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin"

R命令提取两个包含大括号的字符串之间的文本

R command to extract text between two strings containing curly parentheses

r

bibtex

stringr

编辑