从单个长行中提取特定字符串

Question

我正在尝试从包含多个 ID 的单个长行中提取某些网络接口的 ID。我已经尝试使用 split 但没有成功。我将不胜感激任何帮助

这是输入示例，请记住这是在一行文本中。

"Authentication success on Interface Gi1/0/20 AuditSessionID 0000000XXXXXXXXXX, Authentication success on Interface Gi1/0/24 AuditSessionID 0000000XXXXXXXXXX, Authentication not succeed on Interface Fi1/0/10 AuditSessionID 0000000XXXXXXXXXX"

我期待输出 Gi1/0/20 Gi1/0/24 Fi1/0/10

Answer 1

正则表达式适合这个任务：

import re

text = 'Authentication success on Interface Gi1/0/20 AuditSessionID 0000000XXXXXXXXXX, Authentication success on Interface Gi1/0/24 AuditSessionID 0000000XXXXXXXXXX, Authentication not succeed on Interface Fi1/0/10 AuditSessionID 0000000XXXXXXXXXX'
re.findall('Interface (.*?) ', text)

re.findall() 将 return 包含您想要的内容的列表。

['Gi1/0/20', 'Gi1/0/24', 'Fi1/0/10']

模式 'Interface (.*?) ' 的工作原理是匹配以单词 "Interface" 开头的所有内容，然后是 space，然后是某物或什么都不是，然后是另一个 space。前面提到的某物或无物由 (.*?) 表示，它捕获（即它被添加到 re.findall() 的输出）与 .*? 匹配的任何字符（.), 任意次数 (*), 尽可能少的匹配 (?).你可以在像 https://regex101.com/ 这样的网站上玩正则表达式，这将允许你运行 Python 正则表达式，并解释它们（比我更好）。

Answer 2

尚不完全清楚哪些属性定义了您要提取的模式，但这里有一个严格的正则表达式，匹配一个大写字母后跟一个小写字母、一个数字、一个斜杠、另一个数字，然后是一个斜杠和两个数字.如果输入字符串中存在重复字符和其他字符，您可以轻松地将其扩展为包括重复字符和其他字符。

import re

s = "Authentication success on Interface Gi1/0/20 AuditSessionID 0000000XXXXXXXXXX, Authentication success on Interface Gi1/0/24 AuditSessionID 0000000XXXXXXXXXX, Authentication not succeed on Interface Fi1/0/10 AuditSessionID 0000000XXXXXXXXXX"

print(re.findall(r"[A-Z][a-z]\d/\d/\d\d", s))

输出：

['Gi1/0/20', 'Gi1/0/24', 'Fi1/0/10']

从单个长行中提取特定字符串

Extract specific strings from a single long line

python

text-parsing