解析带有标签的字符串,该位置有时可能会反转

Parse string with tag that position might reverse sometimes

我正在尝试从与网络通信的日志中解析一个字符串,类似于

2019 Jun 30 15:40:17.561 NETWORK_MESSAGE
Direction = UE_TO_NETWORK
From: <1106994972>
To: <3626301680>

这是我的代码:

import re
log = '2019 Jun 30 15:40:17.561 NETWORK_MESSAGE\r\nDirection = UE_TO_NETWORK\r\nFrom: <1106994972>\r\nTo: <3626301680>\r\n'
PATTERN = re.compile(
    '(?P<time>\d{2}:\d{2}:\d{2}.\d{3}).*'  # Time
    'Direction = (?P<Direction>\S+).*'     # Direction
    'From: <(?P<From>\S+)>.*'              # from
    'To: <(?P<To>\S+)>',                   # to
    re.DOTALL)
results = PATTERN.search(log)
print(results.group('From'))

不过我刚发现有时候“From”和“To”的位置会颠倒,就像下面这样。

2019 Jun 30 15:40:16.548 NETWORK_MESSAGE
Direction = NETWORK_TO_UE
To: <3626301680>
From: <1106994972>

我可以只用一种模式解决这个问题吗?

这是一个解决方案,它使用 (From|To) 来匹配 FromTo,然后显式检查两个位置中的哪一个匹配 From:

import re
log1 = '2019 Jun 30 15:40:17.561 NETWORK_MESSAGE\r\nDirection = UE_TO_NETWORK\r\nFrom: <1106994972>\r\nTo: <3626301680>\r\n'
log2 = '2019 Jun 30 15:40:17.561 NETWORK_MESSAGE\r\nDirection = UE_TO_NETWORK\r\nTo: <3626301680>\r\nFrom: <1106994972>\r\n'


PATTERN = re.compile(
    '(?P<time>\d{2}:\d{2}:\d{2}.\d{3}).*'  # Time
    'Direction = (?P<Direction>\S+).*'     # Direction
    '(?P<tag1>From|To): <(?P<val1>\S+)>.*' # from or to
    '(?P<tag2>From|To): <(?P<val2>\S+)>',  # from or to
    re.DOTALL)
for log in [log1, log2]:
    results = PATTERN.search(log)
    if results.group('tag1') == 'From':
        print(results.group('val1'))
    elif results.group('tag2') == 'From':
        print(results.group('val2'))

这与您的行相匹配,但不能确保 FromTo 上有一个。 我也考虑过这种模式

PATTERN = re.compile(
    '(?P<time>\d{2}:\d{2}:\d{2}.\d{3}).*'  # Time
    'Direction = (?P<Direction>\S+).*'     # Direction
    '(?P<FromTo>(?P<tag1>From|To): <(?P<val1>\S+)>.*){2}', # from or to
    re.DOTALL)

但这只会捕获 FromTo 中的 last 匹配项(根据 docs “如果包含一个组在多次匹配的模式部分中,返回最后一个匹配项。”)。因此,如果这两个字段以错误的顺序出现,那么您将无法获得 From.

的值

如果事情变得更复杂,您可以通过使用多个模式来获得更易读的代码。

log1 = "2019 Jun 30 15:40:17.561 NETWORK_MESSAGE\r\nDirection = UE_TO_NETWORK\r\nFrom: <1106994972>\r\nTo: <3626301680>\r\n"
log2 = "2019 Jun 30 15:40:16.548 NETWORK_MESSAGE\r\nDirection = NETWORK_TO_UE\r\nTo: <3626301680>\r\nFrom: <1106994972>\r\n"

PATTERN = re.compile(
    '(?P<time>\d{2}:\d{2}:\d{2}.\d{3}).*'  # Time
    'Direction = (?P<Direction>\S+).*'     # Direction
    '(From|To): <(?P<X>\S+)>.*'             
    '(To|From): <(?P<Y>\S+)>',                 
    re.DOTALL)

print(re.findall(PATTERN, log1))
print(re.findall(PATTERN, log2))