我的正则表达式没有捕捉到文本中所需的模式?
my regex is not catching required pattern in text?
我正在尝试使用正则表达式提取持续时间,
示例文本,
text = "Google, Inc 09/19 - 09/20 CA, USA"
这是我的正则表达式,
pattern = fr"""
(?:
(
\d\d(?:\.|\/)\d\d\d\d|
(?:{months_abr})?
(?:{months_exp})?
(?:
(?:[\s\.\/\-]?\d{{2,4}})
)
)\s*(?:\-|to|\s)\s*
(
\d\d(?:\.|\/)\d\d\d\d|
(?:{months_abr})?
(?:{months_exp})?
(?:
(?:[\s\.\/\-]?\d{{2,4}})
)|
current|present|till\s?\-?date|till\s?\-?now|till\s?\-?date|to\s\-?present|until\s?\-?now|till\s?\-?now
)
)"""
find_all = re.findall(
pattern, text, flags=re.MULTILINE | re.VERBOSE | re.IGNORECASE
)
我得到的输出,
[('/19', '09')]
你可以使用
pattern = fr"""
(?<!\d) # A position not immediately preceded with digit
( # Group 1
(?:\d?\d[./])?\d\d(?:\d\d)? # one or two digits and . or / (optionally), two or four digits
| # or
(?:{months_abr}|{months_exp}) [\s./-]? \d\d(?:\d\d)? # month, space/dot/slash/hyphen and then two/four digits
) # end of Group 1
\s*(?:-|to)\s* # - or "to" enclosed with 0+ whitespaces
( # Group 2
(?:\d?\d[./])?\d\d(?:\d\d)?
|
(?:{months_abr}|{months_exp}) [\s./-]?\d\d(?:\d\d)?
|
current|present|(?:un)?till\s?-?(?:date|now|date)|to\s-?present # some alternatives denoting time
)
"""
参见Python demo。输出:[('09/19', '09/20')]
.
参见regex demo。
注意:我决定使用 \d\d
而不是 \d{2}
来缩短代码,因为在 f-strings 中你需要使用 {{
和 }}
来定义文字花括号,它们使字符串在这里看起来很难看。
我正在尝试使用正则表达式提取持续时间,
示例文本,
text = "Google, Inc 09/19 - 09/20 CA, USA"
这是我的正则表达式,
pattern = fr"""
(?:
(
\d\d(?:\.|\/)\d\d\d\d|
(?:{months_abr})?
(?:{months_exp})?
(?:
(?:[\s\.\/\-]?\d{{2,4}})
)
)\s*(?:\-|to|\s)\s*
(
\d\d(?:\.|\/)\d\d\d\d|
(?:{months_abr})?
(?:{months_exp})?
(?:
(?:[\s\.\/\-]?\d{{2,4}})
)|
current|present|till\s?\-?date|till\s?\-?now|till\s?\-?date|to\s\-?present|until\s?\-?now|till\s?\-?now
)
)"""
find_all = re.findall(
pattern, text, flags=re.MULTILINE | re.VERBOSE | re.IGNORECASE
)
我得到的输出,
[('/19', '09')]
你可以使用
pattern = fr"""
(?<!\d) # A position not immediately preceded with digit
( # Group 1
(?:\d?\d[./])?\d\d(?:\d\d)? # one or two digits and . or / (optionally), two or four digits
| # or
(?:{months_abr}|{months_exp}) [\s./-]? \d\d(?:\d\d)? # month, space/dot/slash/hyphen and then two/four digits
) # end of Group 1
\s*(?:-|to)\s* # - or "to" enclosed with 0+ whitespaces
( # Group 2
(?:\d?\d[./])?\d\d(?:\d\d)?
|
(?:{months_abr}|{months_exp}) [\s./-]?\d\d(?:\d\d)?
|
current|present|(?:un)?till\s?-?(?:date|now|date)|to\s-?present # some alternatives denoting time
)
"""
参见Python demo。输出:[('09/19', '09/20')]
.
参见regex demo。
注意:我决定使用 \d\d
而不是 \d{2}
来缩短代码,因为在 f-strings 中你需要使用 {{
和 }}
来定义文字花括号,它们使字符串在这里看起来很难看。