Python 重新 search/match “|”问题

Question

在 Perl 时代，我是一个正则表达式怪胎。我肯定很难适应重新。为了简化大数据集，我需要搜索“|”字符，唯一可行的组合是 re.escape'|' 和 re.search 而不是 re.match

import re

x = re.compile((re.escape'|'))
cohort = ['virus_1', 'virus_2|virus_3']

for isolate in cohort:
#   note that re.escape(isolate) fails
    if x.search(isolate):
        print(isolate)

输出

virus_2|virus_3

好的，上面的组合有效，但 re.match 无效。另外，为什么我需要 re.escape('|') 以及为什么 re.escape(isolate)，即列表元素失败？我经常使用 re 缺少什么？

Answer 1

因此，有两点可能与 perl 不同： "re.match" in Python 必须匹配以 -
开头的字符串那就是：你必须创建一个从一开始就匹配的正则表达式上的字符串。在字符串中的任意位置查找模式您可以使用 re.search 或 re.findall.

另一件事确实与转义有关： Python 解析器使用的 \ 字符，编译代码前，指明特殊控制字符传递给 re 调用的纯字符串中可能会出现问题。所以 Python 有一种特殊形式的字符串，其中引号是前缀使用 r，如 r"regexp_here"，解析器不会触及 \ 字符并创建一个始终包含的字符串对象文字 \ 字符。这个字符串适合传递作为各种 re 函数的参数。然后，您只需在内部通常使用 \ 转义 | r 标记的字符串：

In [164]: cohort = ['virus_1', 'virus_2|virus_3']                                                                                    

In [165]: [string for string in cohort if re.search(r"\|", string)]                                                                  
Out[165]: ['virus_2|virus_3']

In [166]: [string for string in cohort if re.match(r"^.*?\|", string)]                                                               
Out[166]: ['virus_2|virus_3']

In [167]: [string for string in cohort if re.match(r"\|", string)]                                                                   
Out[167]: []

Python 重新 search/match “|”问题

Python re search/match "|" issue

python-3.x

python-re