Python 正则表达式查找以 ' 开头并以 ' 结尾的所有字符串。tr 忽略前导和尾随空格

Question

我正在努力为我的脚本获取正确的 regex。我想在 file 中找到以 ' 开头并以 '.tr 结尾的所有 Substrings。并将所有这些匹配项保存在列表中。

这是我目前得到的：

import glob
import pathlib
import re
       
libPathString = str(pathlib.Path.cwd().parent.resolve()) 

for path in glob.glob(libPathString + "/**", recursive=True):
    if(".dart" in path):
        with open(path, 'r+', encoding="utf-8") as file:
            data = [line.strip() for line in file.readlines()]
            data = ''.join(data)
            words = re.findall(r'\'.*\'.tr', data)
            print(words)

第一个问题是 words 不仅仅是匹配的子串，而是整个文件直到子串。

它还给我这个文件：

  child: Hero(
    tag: heroTag ?? '',  // <- because of this and the line below starts with `tr`
    transitionOnUserGestures: true,
    child: Material(

但是这个不应该匹配！

然后找不到这个：

  AutoSizeText(
      'Das ist ein langer Text, der immer in einer Zeile ist.'
          .tr,
      style: AppTextStyles.montserratH4Regular,

这个应该匹配！

我在这里错过了什么？

Answer 1

您可以使用

words = re.findall(r"'[^'\]*(?:\.[^'\]*)*'\s*\.tr\b", data)

参见Python demo。详情:

'[^'\]*(?:\.[^'\]*)*' - '，除 ' 和 \ 之外的零个或多个字符，然后是 \ 的零个或多个序列，后跟任何单个字符和除 ' 和 \ 之外的任何零个或多个字符（这将匹配 ' 个字符之间的字符串以及其间的任何转义字符）
\s* - 零个或多个空格（这将匹配任何空格，包括换行符）
\.tr - .tr 字符串（注意转义后的 . 现在匹配一个点）
\b - 单词边界。

Answer 2

你可以试试这个

\s*'(.+?)'\s*\.tr

但是，您的用途似乎是从 .dart 文件中提取要翻译的字符串。我认为为此目的使用可以解析dart语言的AST的库会更优雅。

Python 正则表达式查找以 ' 开头并以 ' 结尾的所有字符串。tr 忽略前导和尾随空格

Python regex to find all strings that start with ' and end with '.tr ignoring leading and trailing whitespaces

python

regex

python-re