在 python 中使用正则表达式排除给定的字符串

Question

如果在任何给定字符串中遇到 'un'，我想排除所有内容。下面是我的代码，它只输出 French！ Le@ Mans - Quevilly Ro 而不是 French！ Le@ Mans - Quevilly Rouen 任何关于如何解决这个问题的帮助将不胜感激。

import requests, bs4, re

get_reg = re.compile(r'''
    ([\w+\W+]*(\s\w+)*\s-\s+\w+[^'un']*)  #teams
    (\s\w+\s) #tip
    (@\d+.\d+)
    ''', re.VERBOSE)
print(get_reg.findall("French! Le@ Mans - Quevilly Rouen un3.5 @1.23"))

Answer 1

[] 匹配其中的 any 个字符，因此 [\w+\W+] 匹配任何字符，要么是单词字符，要么不是单词字符，或者+，这显然不是您想要的。同样，[^'un'] 匹配任何不属于 '、u 或 n.

的字符

如您所问，该问题的解决方案是：

re.findall(r'(.*?)\s*un', "French! Le@ Mans - Quevilly Rouen un3.5 @1.23")

演示：https://regexr.com/40806

这会非贪婪地匹配任何后跟空格和 un 的内容，并将第一部分作为一个组返回给您。

但是，从您的代码看来，您也在尝试匹配字符串的其他部分，并且从评论中的讨论来看，我认为您想要的是：

get_reg = re.compile(r'(.*?)\s*(un\d+.\d+)\s*(@\d+.\d+)')
print(get_reg.findall("French! Le@ Mans - Quevilly Rouen un3.5 @1.23"))

演示：https://regexr.com/4085t

我已经从匹配组中排除了部分之间的空格，尽管这与您的示例不同，因为我怀疑这对您更有用。

在 python 中使用正则表达式排除给定的字符串

Exclude a given string using Regex in python

python

regex-group