捕获所有文本直到下一次出现，包括正则表达式中的换行符 (Python 3.x)

Question

我有一个 .tex 文件，我希望从中为每个问题准备独立的文件。简而言之，最后我需要以下格式的问题和他们的选择。

[
    {'question': 
        {'text' : 'If $\vec A = t^2 \vec i -t \vec j + (2t +1)\vec k$, what is  $\dfrac{dA}{dt}$?',
        'marks': 2
        },
     'choices': [{'text': 't\vec i -  \vec j + 2\vec k$', 'is_correct': True }, {'text': 't\vec i +  \vec j + 2\vec k$', 'is_correct': False, ...}]
    },
...]

我使用 python 3.x 编写正则表达式来搜索和完成工作。以下是一个最小的例子

 \documentclass{exam} 
  
  
  \begin{questions}
  
  \question[2] If $\vec A = t^2 \vec i -t \vec j + (2t +1)\vec k$, what is  $\dfrac{dA}{dt}$?
  \begin{multicols}{2}
    \begin{choices} 
    \correctchoice   t\vec i -  \vec j + 2\vec k$
  
    \choice  t\vec i +  \vec j + 2\vec k$
  
    \choice  $t\vec i -  \vec j + 2\vec k$
  
    \choice  None of these
   
    \end{choices}
  \end{multicols}
      
  \question[2] Is $\nabla \phi $ is perpendicular
   to the surface $\phi(x,y,z) = c$ where $c$ is a constant?
  \begin{multicols}{2}
    \begin{choices} 
    \choice  Not all the times
  
    \correctchoice   Yes, Always
  
    \choice  No, Never
  
    \choice  None of these
   
    \end{choices}
  \end{multicols}
      
  
      
  \end{questions}
  \end{document}

我擅长编写正则表达式并将问题与其选择分开。我尝试了以下方法：

import re

with open('qpaper.tex', 'r') as f:
    content = f.read()
    regex = re.compile(r"\question.*")
    qns = regex.findall(content)
    for q in qns:
        print(q)

这仅给出 \question 之后但直到换行符的文本。就我而言，我需要它直到 \question.

的下一次出现

注意: 就我而言，我所有的试卷都会有完全相同的格式，而且所有的问题都会在里面 \begin{questions}...\end{questions}，如果有任何其他更好的方法来实现这一点，那也会有所帮助。

Answer 1

使用

(?s)\question.*?(?=\question|\Z)

见proof

此表达式提取从 \question 开始到下一个最接近的 \question 或字符串结尾 (\Z) 的子字符串。

解释：

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?s)                     set flags for this block (with . matching
                           \n) (case-sensitive) (with ^ and $
                           matching normally) (matching whitespace
                           and # normally)
--------------------------------------------------------------------------------
  \                       '\'
--------------------------------------------------------------------------------
  question                 'question'
--------------------------------------------------------------------------------
  .*?                      any character (0 or more times (matching
                           the least amount possible))
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \                       '\'
--------------------------------------------------------------------------------
    question                 'question'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \Z                       the end of the string
--------------------------------------------------------------------------------
  )                        end of look-ahead

捕获所有文本直到下一次出现，包括正则表达式中的换行符 (Python 3.x)

Capturing all the text until the next occurrence including the line breaks in regex (Python 3.x)

regex

python-3.x

regex-lookarounds

python-re