在 python 中使用正则表达式检测特定字符串
Detecting a specific string with regex in python
我有一个具有以下字符串结构的 csv 文件:
Modem Switch (MMA-213-MML-NW-Match-New Year)(32655)(12532)
Modem Switch3 (MMA-1234-431-NW-Match-New Month)(32655)(12532)
Modem Switch3 (MMA-1234-431-NW-Match-New1 Month)(32655)(12532)
Modem Switch3 (MMA-1234-431-NW-Match-New Month 2)(32655)(12532)
....
我想获取 Match 之后的任何字符串:
例如,共享的预期结果如下:
New Year
New Month
New1 Month
New Month 2
使用以下代码无法获取我的相关字符串:
matches = re.findall(r'(Match-)(\w+)', inp, flags=re.I)
这个有效:
import re
inp = "Modem Switch3 (MMA-1234-431-NW-Match-New Month)(32655)(12532)"
matches = re.findall(r'Match-(.+?)\)', inp, flags=re.I)
给予
['New Month']
您也可以匹配后面的所有单词字符,中间有空格,并使用单个捕获组。
\bMatch-(\w+(?:[^\S\r\n]+\w+)*)
import re
regex = r"\bMatch-(\w+(?:[^\S\r\n]+\w+)*)"
s = ("Modem Switch (MMA-213-MML-NW-Match-New Year)(32655)(12532)\n"
"Modem Switch3 (MMA-1234-431-NW-Match-New Month)(32655)(12532)\n"
"Modem Switch3 (MMA-1234-431-NW-Match-New1 Month)(32655)(12532)\n"
"Modem Switch3 (MMA-1234-431-NW-Match-New Month 2)(32655)(12532)")
print(re.findall(regex, s))
输出
['New Year', 'New Month', 'New1 Month', 'New Month 2']
或者要匹配 Match-
之后的所有括号,您可以使用 negated character class 匹配除 (
和 )
之外的任何字符
\([^()]*\bMatch-([^()]+)\)
使用
re.findall(r'(?<=Match-)[^()]+', inp, flags=re.I)
说明
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
Match- 'Match-'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
[^()]+ any character except: '(', ')' (1 or more
times (matching the most amount possible))
import re
inp = """Modem Switch (MMA-213-MML-NW-Match-New Year)(32655)(12532)
Modem Switch3 (MMA-1234-431-NW-Match-New Month)(32655)(12532)
Modem Switch3 (MMA-1234-431-NW-Match-New1 Month)(32655)(12532)
Modem Switch3 (MMA-1234-431-NW-Match-New Month 2)(32655)(12532)"""
print(re.findall(r'(?<=Match-)[^()]+', inp, flags=re.I))
结果:['New Year', 'New Month', 'New1 Month', 'New Month 2']
我有一个具有以下字符串结构的 csv 文件:
Modem Switch (MMA-213-MML-NW-Match-New Year)(32655)(12532)
Modem Switch3 (MMA-1234-431-NW-Match-New Month)(32655)(12532)
Modem Switch3 (MMA-1234-431-NW-Match-New1 Month)(32655)(12532)
Modem Switch3 (MMA-1234-431-NW-Match-New Month 2)(32655)(12532)
....
我想获取 Match 之后的任何字符串: 例如,共享的预期结果如下:
New Year
New Month
New1 Month
New Month 2
使用以下代码无法获取我的相关字符串:
matches = re.findall(r'(Match-)(\w+)', inp, flags=re.I)
这个有效:
import re
inp = "Modem Switch3 (MMA-1234-431-NW-Match-New Month)(32655)(12532)"
matches = re.findall(r'Match-(.+?)\)', inp, flags=re.I)
给予
['New Month']
您也可以匹配后面的所有单词字符,中间有空格,并使用单个捕获组。
\bMatch-(\w+(?:[^\S\r\n]+\w+)*)
import re
regex = r"\bMatch-(\w+(?:[^\S\r\n]+\w+)*)"
s = ("Modem Switch (MMA-213-MML-NW-Match-New Year)(32655)(12532)\n"
"Modem Switch3 (MMA-1234-431-NW-Match-New Month)(32655)(12532)\n"
"Modem Switch3 (MMA-1234-431-NW-Match-New1 Month)(32655)(12532)\n"
"Modem Switch3 (MMA-1234-431-NW-Match-New Month 2)(32655)(12532)")
print(re.findall(regex, s))
输出
['New Year', 'New Month', 'New1 Month', 'New Month 2']
或者要匹配 Match-
之后的所有括号,您可以使用 negated character class 匹配除 (
和 )
\([^()]*\bMatch-([^()]+)\)
使用
re.findall(r'(?<=Match-)[^()]+', inp, flags=re.I)
说明
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
Match- 'Match-'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
[^()]+ any character except: '(', ')' (1 or more
times (matching the most amount possible))
import re
inp = """Modem Switch (MMA-213-MML-NW-Match-New Year)(32655)(12532)
Modem Switch3 (MMA-1234-431-NW-Match-New Month)(32655)(12532)
Modem Switch3 (MMA-1234-431-NW-Match-New1 Month)(32655)(12532)
Modem Switch3 (MMA-1234-431-NW-Match-New Month 2)(32655)(12532)"""
print(re.findall(r'(?<=Match-)[^()]+', inp, flags=re.I))
结果:['New Year', 'New Month', 'New1 Month', 'New Month 2']