解析带有标签的字符串,该位置有时可能会反转
Parse string with tag that position might reverse sometimes
我正在尝试从与网络通信的日志中解析一个字符串,类似于
2019 Jun 30 15:40:17.561 NETWORK_MESSAGE
Direction = UE_TO_NETWORK
From: <1106994972>
To: <3626301680>
这是我的代码:
import re
log = '2019 Jun 30 15:40:17.561 NETWORK_MESSAGE\r\nDirection = UE_TO_NETWORK\r\nFrom: <1106994972>\r\nTo: <3626301680>\r\n'
PATTERN = re.compile(
'(?P<time>\d{2}:\d{2}:\d{2}.\d{3}).*' # Time
'Direction = (?P<Direction>\S+).*' # Direction
'From: <(?P<From>\S+)>.*' # from
'To: <(?P<To>\S+)>', # to
re.DOTALL)
results = PATTERN.search(log)
print(results.group('From'))
不过我刚发现有时候“From”和“To”的位置会颠倒,就像下面这样。
2019 Jun 30 15:40:16.548 NETWORK_MESSAGE
Direction = NETWORK_TO_UE
To: <3626301680>
From: <1106994972>
我可以只用一种模式解决这个问题吗?
这是一个解决方案,它使用 (From|To)
来匹配 From
或 To
,然后显式检查两个位置中的哪一个匹配 From
:
import re
log1 = '2019 Jun 30 15:40:17.561 NETWORK_MESSAGE\r\nDirection = UE_TO_NETWORK\r\nFrom: <1106994972>\r\nTo: <3626301680>\r\n'
log2 = '2019 Jun 30 15:40:17.561 NETWORK_MESSAGE\r\nDirection = UE_TO_NETWORK\r\nTo: <3626301680>\r\nFrom: <1106994972>\r\n'
PATTERN = re.compile(
'(?P<time>\d{2}:\d{2}:\d{2}.\d{3}).*' # Time
'Direction = (?P<Direction>\S+).*' # Direction
'(?P<tag1>From|To): <(?P<val1>\S+)>.*' # from or to
'(?P<tag2>From|To): <(?P<val2>\S+)>', # from or to
re.DOTALL)
for log in [log1, log2]:
results = PATTERN.search(log)
if results.group('tag1') == 'From':
print(results.group('val1'))
elif results.group('tag2') == 'From':
print(results.group('val2'))
这与您的行相匹配,但不能确保 From
和 To
上有一个。
我也考虑过这种模式
PATTERN = re.compile(
'(?P<time>\d{2}:\d{2}:\d{2}.\d{3}).*' # Time
'Direction = (?P<Direction>\S+).*' # Direction
'(?P<FromTo>(?P<tag1>From|To): <(?P<val1>\S+)>.*){2}', # from or to
re.DOTALL)
但这只会捕获 From
和 To
中的 last 匹配项(根据 docs “如果包含一个组在多次匹配的模式部分中,返回最后一个匹配项。”)。因此,如果这两个字段以错误的顺序出现,那么您将无法获得 From
.
的值
如果事情变得更复杂,您可以通过使用多个模式来获得更易读的代码。
log1 = "2019 Jun 30 15:40:17.561 NETWORK_MESSAGE\r\nDirection = UE_TO_NETWORK\r\nFrom: <1106994972>\r\nTo: <3626301680>\r\n"
log2 = "2019 Jun 30 15:40:16.548 NETWORK_MESSAGE\r\nDirection = NETWORK_TO_UE\r\nTo: <3626301680>\r\nFrom: <1106994972>\r\n"
PATTERN = re.compile(
'(?P<time>\d{2}:\d{2}:\d{2}.\d{3}).*' # Time
'Direction = (?P<Direction>\S+).*' # Direction
'(From|To): <(?P<X>\S+)>.*'
'(To|From): <(?P<Y>\S+)>',
re.DOTALL)
print(re.findall(PATTERN, log1))
print(re.findall(PATTERN, log2))
我正在尝试从与网络通信的日志中解析一个字符串,类似于
2019 Jun 30 15:40:17.561 NETWORK_MESSAGE
Direction = UE_TO_NETWORK
From: <1106994972>
To: <3626301680>
这是我的代码:
import re
log = '2019 Jun 30 15:40:17.561 NETWORK_MESSAGE\r\nDirection = UE_TO_NETWORK\r\nFrom: <1106994972>\r\nTo: <3626301680>\r\n'
PATTERN = re.compile(
'(?P<time>\d{2}:\d{2}:\d{2}.\d{3}).*' # Time
'Direction = (?P<Direction>\S+).*' # Direction
'From: <(?P<From>\S+)>.*' # from
'To: <(?P<To>\S+)>', # to
re.DOTALL)
results = PATTERN.search(log)
print(results.group('From'))
不过我刚发现有时候“From”和“To”的位置会颠倒,就像下面这样。
2019 Jun 30 15:40:16.548 NETWORK_MESSAGE
Direction = NETWORK_TO_UE
To: <3626301680>
From: <1106994972>
我可以只用一种模式解决这个问题吗?
这是一个解决方案,它使用 (From|To)
来匹配 From
或 To
,然后显式检查两个位置中的哪一个匹配 From
:
import re
log1 = '2019 Jun 30 15:40:17.561 NETWORK_MESSAGE\r\nDirection = UE_TO_NETWORK\r\nFrom: <1106994972>\r\nTo: <3626301680>\r\n'
log2 = '2019 Jun 30 15:40:17.561 NETWORK_MESSAGE\r\nDirection = UE_TO_NETWORK\r\nTo: <3626301680>\r\nFrom: <1106994972>\r\n'
PATTERN = re.compile(
'(?P<time>\d{2}:\d{2}:\d{2}.\d{3}).*' # Time
'Direction = (?P<Direction>\S+).*' # Direction
'(?P<tag1>From|To): <(?P<val1>\S+)>.*' # from or to
'(?P<tag2>From|To): <(?P<val2>\S+)>', # from or to
re.DOTALL)
for log in [log1, log2]:
results = PATTERN.search(log)
if results.group('tag1') == 'From':
print(results.group('val1'))
elif results.group('tag2') == 'From':
print(results.group('val2'))
这与您的行相匹配,但不能确保 From
和 To
上有一个。
我也考虑过这种模式
PATTERN = re.compile(
'(?P<time>\d{2}:\d{2}:\d{2}.\d{3}).*' # Time
'Direction = (?P<Direction>\S+).*' # Direction
'(?P<FromTo>(?P<tag1>From|To): <(?P<val1>\S+)>.*){2}', # from or to
re.DOTALL)
但这只会捕获 From
和 To
中的 last 匹配项(根据 docs “如果包含一个组在多次匹配的模式部分中,返回最后一个匹配项。”)。因此,如果这两个字段以错误的顺序出现,那么您将无法获得 From
.
如果事情变得更复杂,您可以通过使用多个模式来获得更易读的代码。
log1 = "2019 Jun 30 15:40:17.561 NETWORK_MESSAGE\r\nDirection = UE_TO_NETWORK\r\nFrom: <1106994972>\r\nTo: <3626301680>\r\n"
log2 = "2019 Jun 30 15:40:16.548 NETWORK_MESSAGE\r\nDirection = NETWORK_TO_UE\r\nTo: <3626301680>\r\nFrom: <1106994972>\r\n"
PATTERN = re.compile(
'(?P<time>\d{2}:\d{2}:\d{2}.\d{3}).*' # Time
'Direction = (?P<Direction>\S+).*' # Direction
'(From|To): <(?P<X>\S+)>.*'
'(To|From): <(?P<Y>\S+)>',
re.DOTALL)
print(re.findall(PATTERN, log1))
print(re.findall(PATTERN, log2))