如何在 Python 中编写带有命名组的正则表达式来匹配这个?
How to write a regex in Python with named groups to match this?
我有一个包含以下行的文件。
comm=adbd pid=11108 优先级=120 成功=1 target_cpu=001
我写了下面的正则表达式来匹配。
_sched_wakeup_pattern = re.compile(r"""
comm=(?P<next_comm>.+?)
\spid=(?P<next_pid>\d+)
\sprio=(?P<next_prio>\d+)
\ssuccess=(?P<success>\d)
\starget_cpu=(?P<target_cpu>\d+)
""", re.VERBOSE)
但是现在我也有像下面这样的行,其中不存在成功组件。
comm=rcu_preempt pid=7 prio=120 target_cpu=007
如何在此处修改我的正则表达式以匹配这两种情况?我尝试在包含 "success" 的行中的任何地方放置一个 *,但它会抛出错误。
匹配 0
或 1
重复 (your_string)?
。
_sched_wakeup_pattern = re.compile(r"""
comm=(?P<next_comm>.+?)
\spid=(?P<next_pid>\d+)
\sprio=(?P<next_prio>\d+)
\s?(success=(?P<success>\d))?
\starget_cpu=(?P<target_cpu>\d+)
""", re.VERBOSE)
给你
在这里它寻找整个字符串,所以它也打印 success=
:
output =>
('rcu_preempt', '7', '120', '', '', '007')
('kworker/u16:2', '73', '120', '', '', '006')
('kworker/u16:4', '364', '120', '', '', '005')
('adbd', '11108', '120', 'success=1', '1', '001')
('kworker/1:1', '16625', '120', 'success=1', '1', '001')
('rcu_preempt', '7', '120', 'success=1', '1', '002')
现在我们需要找到一种方法来删除 "success="
。这似乎并不难。
[已编辑]
(?:\ssuccess=)?(?P<success>\d)?
效果很好。
通过 RomanPerekhrest
使用正则表达式非捕获组和regex.findall
函数的解决方案:
import regex
...
fh = open('lines.txt', 'r'); // considering 'lines.txt' is your initial file
commlines = fh.read()
_sched_wakeup_pattern = regex.compile(r"""
comm=(?P<next_comm>[\S]+?)
\spid=(?P<next_pid>\d+)
\sprio=(?P<next_prio>\d+)
(?:\ssuccess=)?(?P<success>\d)?
\starget_cpu=(?P<target_cpu>\d+)
""", regex.VERBOSE)
result = regex.findall(_sched_wakeup_pattern, commlines)
template = "{0:15}|{1:10}|{2:9}|{3:7}|{4:10}" # column widths
print(template.format("next_comm", "next_pid", "next_prio", "success", "target_cpu")) # header
for t in result:
print(template.format(*t))
美化输出:
next_comm |next_pid |next_prio|success|target_cpu
rcu_preempt |7 |120 | |007
kworker/u16:2 |73 |120 | |006
kworker/u16:4 |364 |120 | |005
adbd |11108 |120 |1 |001
kworker/1:1 |16625 |120 |1 |001
rcu_preempt |7 |120 |1 |002
我有一个包含以下行的文件。
comm=adbd pid=11108 优先级=120 成功=1 target_cpu=001
我写了下面的正则表达式来匹配。
_sched_wakeup_pattern = re.compile(r"""
comm=(?P<next_comm>.+?)
\spid=(?P<next_pid>\d+)
\sprio=(?P<next_prio>\d+)
\ssuccess=(?P<success>\d)
\starget_cpu=(?P<target_cpu>\d+)
""", re.VERBOSE)
但是现在我也有像下面这样的行,其中不存在成功组件。
comm=rcu_preempt pid=7 prio=120 target_cpu=007
如何在此处修改我的正则表达式以匹配这两种情况?我尝试在包含 "success" 的行中的任何地方放置一个 *,但它会抛出错误。
匹配 0
或 1
重复 (your_string)?
。
_sched_wakeup_pattern = re.compile(r"""
comm=(?P<next_comm>.+?)
\spid=(?P<next_pid>\d+)
\sprio=(?P<next_prio>\d+)
\s?(success=(?P<success>\d))?
\starget_cpu=(?P<target_cpu>\d+)
""", re.VERBOSE)
给你
在这里它寻找整个字符串,所以它也打印 success=
:
output =>
('rcu_preempt', '7', '120', '', '', '007')
('kworker/u16:2', '73', '120', '', '', '006')
('kworker/u16:4', '364', '120', '', '', '005')
('adbd', '11108', '120', 'success=1', '1', '001')
('kworker/1:1', '16625', '120', 'success=1', '1', '001')
('rcu_preempt', '7', '120', 'success=1', '1', '002')
现在我们需要找到一种方法来删除 "success="
。这似乎并不难。
[已编辑]
(?:\ssuccess=)?(?P<success>\d)?
效果很好。
通过 RomanPerekhrest
使用正则表达式非捕获组和regex.findall
函数的解决方案:
import regex
...
fh = open('lines.txt', 'r'); // considering 'lines.txt' is your initial file
commlines = fh.read()
_sched_wakeup_pattern = regex.compile(r"""
comm=(?P<next_comm>[\S]+?)
\spid=(?P<next_pid>\d+)
\sprio=(?P<next_prio>\d+)
(?:\ssuccess=)?(?P<success>\d)?
\starget_cpu=(?P<target_cpu>\d+)
""", regex.VERBOSE)
result = regex.findall(_sched_wakeup_pattern, commlines)
template = "{0:15}|{1:10}|{2:9}|{3:7}|{4:10}" # column widths
print(template.format("next_comm", "next_pid", "next_prio", "success", "target_cpu")) # header
for t in result:
print(template.format(*t))
美化输出:
next_comm |next_pid |next_prio|success|target_cpu
rcu_preempt |7 |120 | |007
kworker/u16:2 |73 |120 | |006
kworker/u16:4 |364 |120 | |005
adbd |11108 |120 |1 |001
kworker/1:1 |16625 |120 |1 |001
rcu_preempt |7 |120 |1 |002