Python RegEx - 如何处理字符串中的可选部分
Python RegEx - How can I handle optional parts in a string
这是我当前使用正则表达式解析来自消防部门寻呼机的消息的源代码。除 pAddress 行外,一切正常。
import re
sInput = '(CUPE123, CUPE124, MTVW211, MTVW215, SUNV5326) ALARM-STRUC (Alarm Type THERMAL SMOKE) (Box 12345) APPLE INC - 1 INFINITE LOOP CUPERTINO. (XStr DE ANZA BLVD/MARIANI AVE) .BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED. #F987654321'
# Matches truck names using the consistent four uppercase letters followed by three - four numbers.
pTrucks = ','.join(re.findall(r'\w[A-Z]{3}\d[0-9]{2,3}', sInput))
# Matches source and job type using the - as a guide, this section is always proceeded by the trucks on the job
# therefore is always proceeded by a ) and a space. Allows between 3-9 characters either side of the - this is
# to allow such variations as 911-RESC, FAA-AIRCRAFT etc.
pJobSource = ''.join(re.findall(r'\) ([A-Za-z1-9]{2,8}-[A-Za-z1-9]{2,8})', sInput))
# Gets address by starting at (but ignoring) the job source e.g. -RESC and capturing everything until the next . period
# the end of the address section always has a period. Uses ?; to ignore up to two sets of brackets that may appear in
# the string for things such as box numbers or alarm types.
pAddress = ''.join(re.findall(r'-[A-Z1-9]{2,8} (.*?)\. \(', sInput))
# Finds the specified cross streets as they are always within () brackets, each bracket has a space immediately
# before or after and the work XStr is always present.
pCrossStreet = ''.join(re.findall(r' \((XStr.*?)\) ', sInput))
# The job details / description is always contained between two . periods e.g. .42YOM CARDIAC ARREST. each period
# has a space either immediately before or after.
pJobDetails = ''.join(re.findall(r' \.(.*?)\. ', sInput))
# Job number is always in the format #F followed by seven digits. The # is always proceeded by a space. Allowed
# between 1 and 8 digits for future proofing.
pJobNumber = ''.join(re.findall(r' (#F\d{0,7})', sInput))
# Get optional Alarm type which is always presented with a space (Alarm
pAlarmDetails = ''.join(re.findall(r' \((Alarm .*?)\) ', sInput))
# Get optional Box type which is always presented with a space (Box
pBoxDetails = ''.join(re.findall(r' (\(Box .*?\))', sInput))
print "Responding Trucks: " + pTrucks
print "Job Source / Type: " + pJobSource
print "Address: " + pAddress
print "Cross Streets: " + pCrossStreet
print "Job Details: " + pJobDetails
print "Additional Info: " + pAlarmDetails + ", " + pBoxDetails
print "\n\nJob Number: " + pJobNumber
问题是寻呼机输入有两个可选字段
(警报类型 *)和(方框 *)
根据工作的不同,两者都可能存在、不存在或两者兼而有之。目前的代码将 return
Responding Trucks: CUPE123,CUPE124,MTVW211,MTVW215,SUNV5326
Job Source / Type: ALARM-STRUC
Address: (Alarm Type THERMAL SMOKE) (Box 12345) APPLE INC - 1 INFINITE LOOP CUPERTINO
Cross Streets: XStr DE ANZA BLVD/MARIANI AVE
Job Details: BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED
Additional Info: Alarm Type THERMAL SMOKE, (Box 12345)
Job Number: #F9876543
一切都很完美,除了地址行,它还引入了警报类型和 Box#。
如何修改 RegEx 以便将(警报类型)和(框)字段视为可选字段?我已经从另一个 SO 线程尝试过这个,它与当前的 sinput 字符串完美配合。
pAddress = ''.join(re.findall(r'-[A-Z1-9]{2,8}(?: \(Alarm .*?\))(?: \(Box .*\)) (.*?)\. \(', sInput))
returning
Responding Trucks: CUPE123,CUPE124,MTVW211,MTVW215,SUNV5326
Job Source / Type: ALARM-STRUC
Address: APPLE INC - 1 INFINITE LOOP CUPERTINO
Cross Streets: XStr DE ANZA BLVD/MARIANI AVE
Job Details: BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED
Additional Info: Alarm Type THERMAL SMOKE, (Box 12345)
Job Number: #F9876543
这是完美的,也是我想要的结果,但是,当我将 sInput 字符串更改为既不包含 (Alarm Type *) 也不包含 (Box *)
sInput = '(CUPE123, CUPE124, MTVW211, MTVW215, SUNV5326) ALARM-STRUC APPLE INC - 1 INFINITE LOOP CUPERTINO. (XStr DE ANZA BLVD/MARIANI AVE) .BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED. #F987654321'
然后输出return地址字段中什么都没有
Responding Trucks: CUPE123,CUPE124,MTVW211,MTVW215,SUNV5326
Job Source / Type: ALARM-STRUC
Address:
Cross Streets: XStr DE ANZA BLVD/MARIANI AVE
Job Details: BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED
Additional Info: ,
Job Number: #F9876543
我觉得我很接近,只是错过了一些东西...抱歉这么长 post,可能有点 TMI。
TL;DR 如何修改 pAddress 变量的 RegEx 以忽略 (Alarm Type *) 和 (Box *) 字段,无论它们是否存在?
您只需向两个非捕获组添加 ?
(零个或一个匹配)量词。
-[A-Z1-9]{2,8}(?: \(Alarm .*?\))?(?: \(Box .*\))? (.*?)\. \(
现在,无论 Alarm Type
和 Box
是否存在,它都应该可以工作。
这是我当前使用正则表达式解析来自消防部门寻呼机的消息的源代码。除 pAddress 行外,一切正常。
import re
sInput = '(CUPE123, CUPE124, MTVW211, MTVW215, SUNV5326) ALARM-STRUC (Alarm Type THERMAL SMOKE) (Box 12345) APPLE INC - 1 INFINITE LOOP CUPERTINO. (XStr DE ANZA BLVD/MARIANI AVE) .BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED. #F987654321'
# Matches truck names using the consistent four uppercase letters followed by three - four numbers.
pTrucks = ','.join(re.findall(r'\w[A-Z]{3}\d[0-9]{2,3}', sInput))
# Matches source and job type using the - as a guide, this section is always proceeded by the trucks on the job
# therefore is always proceeded by a ) and a space. Allows between 3-9 characters either side of the - this is
# to allow such variations as 911-RESC, FAA-AIRCRAFT etc.
pJobSource = ''.join(re.findall(r'\) ([A-Za-z1-9]{2,8}-[A-Za-z1-9]{2,8})', sInput))
# Gets address by starting at (but ignoring) the job source e.g. -RESC and capturing everything until the next . period
# the end of the address section always has a period. Uses ?; to ignore up to two sets of brackets that may appear in
# the string for things such as box numbers or alarm types.
pAddress = ''.join(re.findall(r'-[A-Z1-9]{2,8} (.*?)\. \(', sInput))
# Finds the specified cross streets as they are always within () brackets, each bracket has a space immediately
# before or after and the work XStr is always present.
pCrossStreet = ''.join(re.findall(r' \((XStr.*?)\) ', sInput))
# The job details / description is always contained between two . periods e.g. .42YOM CARDIAC ARREST. each period
# has a space either immediately before or after.
pJobDetails = ''.join(re.findall(r' \.(.*?)\. ', sInput))
# Job number is always in the format #F followed by seven digits. The # is always proceeded by a space. Allowed
# between 1 and 8 digits for future proofing.
pJobNumber = ''.join(re.findall(r' (#F\d{0,7})', sInput))
# Get optional Alarm type which is always presented with a space (Alarm
pAlarmDetails = ''.join(re.findall(r' \((Alarm .*?)\) ', sInput))
# Get optional Box type which is always presented with a space (Box
pBoxDetails = ''.join(re.findall(r' (\(Box .*?\))', sInput))
print "Responding Trucks: " + pTrucks
print "Job Source / Type: " + pJobSource
print "Address: " + pAddress
print "Cross Streets: " + pCrossStreet
print "Job Details: " + pJobDetails
print "Additional Info: " + pAlarmDetails + ", " + pBoxDetails
print "\n\nJob Number: " + pJobNumber
问题是寻呼机输入有两个可选字段 (警报类型 *)和(方框 *) 根据工作的不同,两者都可能存在、不存在或两者兼而有之。目前的代码将 return
Responding Trucks: CUPE123,CUPE124,MTVW211,MTVW215,SUNV5326
Job Source / Type: ALARM-STRUC
Address: (Alarm Type THERMAL SMOKE) (Box 12345) APPLE INC - 1 INFINITE LOOP CUPERTINO
Cross Streets: XStr DE ANZA BLVD/MARIANI AVE
Job Details: BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED
Additional Info: Alarm Type THERMAL SMOKE, (Box 12345)
Job Number: #F9876543
一切都很完美,除了地址行,它还引入了警报类型和 Box#。
如何修改 RegEx 以便将(警报类型)和(框)字段视为可选字段?我已经从另一个 SO 线程尝试过这个,它与当前的 sinput 字符串完美配合。
pAddress = ''.join(re.findall(r'-[A-Z1-9]{2,8}(?: \(Alarm .*?\))(?: \(Box .*\)) (.*?)\. \(', sInput))
returning
Responding Trucks: CUPE123,CUPE124,MTVW211,MTVW215,SUNV5326
Job Source / Type: ALARM-STRUC
Address: APPLE INC - 1 INFINITE LOOP CUPERTINO
Cross Streets: XStr DE ANZA BLVD/MARIANI AVE
Job Details: BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED
Additional Info: Alarm Type THERMAL SMOKE, (Box 12345)
Job Number: #F9876543
这是完美的,也是我想要的结果,但是,当我将 sInput 字符串更改为既不包含 (Alarm Type *) 也不包含 (Box *)
sInput = '(CUPE123, CUPE124, MTVW211, MTVW215, SUNV5326) ALARM-STRUC APPLE INC - 1 INFINITE LOOP CUPERTINO. (XStr DE ANZA BLVD/MARIANI AVE) .BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED. #F987654321'
然后输出return地址字段中什么都没有
Responding Trucks: CUPE123,CUPE124,MTVW211,MTVW215,SUNV5326
Job Source / Type: ALARM-STRUC
Address:
Cross Streets: XStr DE ANZA BLVD/MARIANI AVE
Job Details: BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED
Additional Info: ,
Job Number: #F9876543
我觉得我很接近,只是错过了一些东西...抱歉这么长 post,可能有点 TMI。
TL;DR 如何修改 pAddress 变量的 RegEx 以忽略 (Alarm Type *) 和 (Box *) 字段,无论它们是否存在?
您只需向两个非捕获组添加 ?
(零个或一个匹配)量词。
-[A-Z1-9]{2,8}(?: \(Alarm .*?\))?(?: \(Box .*\))? (.*?)\. \(
现在,无论 Alarm Type
和 Box
是否存在,它都应该可以工作。