正则表达式在一个部分中查找所有两个三个或四个大写的单词

Regex Finding all two three or four capitalized words in a section

我希望在 python 此处提到的所有股票(大写字母)中识别并创建一个列表..

问题我有一个大文本文档,其中许多区域包含 2、3 或 4 个大写字母,但我只想获取段落结尾之前的那些(待观察股票在下一段中):

即 SE、SAM、PYPL、LAD、GLOB 等

不确定非捕获组是否可行,或者我是否可以向后看。.如果我使用非捕获组,我认为这样的方法会起作用,但它没有...任何帮助非常感谢

(?<=\(stocks\-to\-watch\sare\sin\sthe\sfollowing\paragraph)\:\s+)(\b[A-Z]{2,4}\b)(?:Remember\sstrong\svolume)

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut at sapien fermentum, (stocks-to-watchch 在下段):

SE e3/3, SAM e4/22, PYPL flag bo e5/6, LAD bo e4/22, GLOB RS72 e5/14, CCS bo e4/28, CRL RS 81 ATH e5/7, ENTG rbo 压水堆
e4/21、CAMT 3wkbo e5/11、DAR TRbo e5/6、+TFII bo e4/21、GNRC ebo e4/30、bo e5/7、GPI bo e5/05, IESC 博 e5/5, SNBR 启动器 e4/22, SHYF e5/7, TBK bo>44.70 e4/21, Wflag bo e4/27 ENSG bo e5/11, >48.11 e5/20, ARCB >50 e5/5, +CSWI TTEC RS82 e5/20, ZBRA ebo RS83 e4/28, WD ebo bo>78.13 e5/6, RILY Mr156 e5/11, ETSY e4/30, UCTT >Mr34.59 e4/29,
BKE ATKR 和 TPX e5/4、COPX Mr40.8、boATH e5/21、CAI >39.35 e5/5、NVMI bo e5/14、BRKS bo e4/30、SI ebo e4/29 , 5/5, 福克斯沃尔玛 RS85 ebo e4/30, BLOK, HZO e4/23, SIVB 2yr bo e4/22, PW bo e3/30, bo e5/6, HIBB >Mr68.31 e3/20, e7/17, 待定 , INMD SBNY

20.24 Mr30.49 e4/30, DKL RS78 e2/23, +XRT AVNW Mr99 e5/12, CROX e4/23, e5/6, SQ e5/6, HUBS e5/6, evol KLIC e7/21, TGH >26.5 Mr35.43 e5/4, bo e4/30, JLL >178.55 RS78 e4/29, AMRK Mr65.03 e4/27, CADE Mr28.66 e4/29, ON bo AGCO e5/9, CMBM e5/12, COWN Mr86.12 e4/29, CUBI Mr36.93 e5/4, AM Mr10.27 e4/29, ASO e5/28, MBIN e4/23, e4/29, DE DEN MTZ HVT e5/21, bo e5/18, e4/30, TSE Mr75.84 e5/6, BIG e5/29, CASH e4/22, EVR bo e4/22, e5/20, JOUT ATH e5/5, MVBF XPEL MX bo e4/28, RWJ na, VMI e4/22, WES Mr41.23 e5/5, EDUC e5/21, TVTY e5/6, bo e5/14, RVLV e5/13, + flag SNX ALLY DKS
boe5/6, MYRG e4/29, e6/25, URI e4/29, VAC e5/6, WSM e5/28, e7/17, bo e6/2, RCKY e4/28, LPX e5/5, AN RH SLM FCNCA TX e4/20, bo e6/4, >18.13 e7/22, IMKTA e5/7, ABCB e4/23, AMAT e5/20, e4/28, ICHR e5/4, e4/27, RBNC LGIH CTRN e4/27, BLDR e4/30, e5/4, MHO e4/29, AMKR bo Mr65 e4/27, SKY e5/20, BZH e4/30, + e5/28, SGH Mathr56.69 SYX BECN YETI RM SAIA PAG e7/7, e4/28, Mr66.47 e5/7, boATH e5/13, IAC e4/28, e4/28, e4/28, ACBI e4/22, bo CHEF NWS GMS e4/28, Mr42.06 e4/28, LOB e4/21, e5/7, bo e6/25, GRBK lbbo Mr88 e5/11, LSCC bo e5/4, SBSW bo CBNK KNL OPY SEM SID TIPT e4/28, SF e4/30, bo e4/30, FIX e4/27, Mr27.26 e4/28, e5/1, e5/6, e4/28, >12.27 e5/7, ALGN ERII bo HWM e LOVE SSL STAA . ( e4/28, e5/6, 5/6, e6/9, Mr 39.23 e4/28, e5/6 ) . Strong volume tends to lead price. Ut lorem ipsum, venenatis et aliquet in, suscipit sit amet tellus. Integer vestibulum luctus rhoncus. Proin at arcu mauris. Nam tempor ipsum quis commodo cursus. Aenean faucibus hendrerit aliquam. Curabitur ullamcorper, metus in volutpat pretium, diam purus laoreet diam, non pulvinar massa justo ac leo. Aenean vehicula, orci in rutrum sodales, neque nulla maximus purus, quis suscipit nulla nisi non nibh. Nunc a molestie nunc. Cras velit risus, eleifend ut aliquet rhoncus, ullamcorper non risus. Nam tristique facilisis purus, sed fringilla enim pulvinar vitae. Nunc dignissim consectetur molestie. Mauris id maximus lorem.

提取两个字符串之间的子串:

\(stocks-to-watch\s+are\s+in\s+the\s+following\s+paragraph\):([\s\S]*?)Strong\svolume

proof #1

之后,您将能够使用您的原始 \b[A-Z]{2,4}\b 从第 1 组中提取您需要的匹配项。

解释

--------------------------------------------------------------------------------
  \(                       '('
--------------------------------------------------------------------------------
  stocks-to-watch          'stocks-to-watch'
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  are                      'are'
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  in                       'in'
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  the                      'the'
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  following                'following'
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  paragraph                'paragraph'
--------------------------------------------------------------------------------
  \)                       ')'
--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    [\s\S]*?                 any character of: whitespace (\n, \r,
                             \t, \f, and " "), non-whitespace (all
                             but \n, \r, \t, \f, and " ") (0 or more
                             times (matching the least amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  Strong                   'Strong'
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  volume                   'volume'