正则表达式不识别逗号

Question

我是 python 正则表达式的新手，遇到逗号问题。

我有一个有效的 phone 数字正则表达式，但我希望它只提取以逗号 (,) 或逗号 space (, ) 开头的 phone 数字。下面识别来自

的phone号

Joe Schmoe, CEO, 888.888.8888
Voicemail

phoneRegex = re.compile(r'''(
    
    (\(?(\d{3})\)?)?
    (\s|-|\.)?
    (\d{3})
    (\s|-|\.)
    (\d{4})
    (\s*(ext|x|ext.)\s*(\d{2,5}))?
    )''',re.VERBOSE)

我想在开头添加 (, )，它在我正在搜索的字符串中，但这不起作用。

phoneRegex = re.compile(r'''(
    (, )
    (\(?(\d{3})\)?)?
    (\s|-|\.)?
    (\d{3})
    (\s|-|\.)
    (\d{4})
    (\s*(ext|x|ext.)\s*(\d{2,5}))?
    )''',re.VERBOSE)

如果有人能指出正确的方向，我将不胜感激。我觉得我缺少一些基本的东西，但我不知道它是什么。

这是我正在处理的完整代码。它应该只用找到的 phone 号码和电子邮件替换剪贴板上的任何内容。

import sys, re, pyperclip

phoneRegex = re.compile(r'''(
    
    (\(?(\d{3})\)?)?
    (\s|-|\.)?
    (\d{3})
    (\s|-|\.)
    (\d{4})
    (\s*(ext|x|ext.)\s*(\d{2,5}))?
    )''',re.VERBOSE)

emailRegex = re.compile(r'''(
    [a-zA-Z0-9._%+-]+
    @
    [a-zA-Z0-9.-]+
    (\.[a-zA-Z]{2,4})
    )''',re.VERBOSE)

#nameREgex = re.compile(r'''(
    #[a-zA-Z \(\)]+
    #(,)

TEXT = str(pyperclip.paste())

matches = []
for groups in phoneRegex.findall(TEXT):
    phoneNum = '-'.join([groups[2],groups[4],groups[6]])
    print(phoneNum)
    print(groups[1])
    #print(groups[2])
    toReturn = '\t'.join([groups[1],phoneNum])
    if groups[9] != '':
        phoneNum += ' X ' + groups[9]
    matches.append(phoneNum)
for groups in emailRegex.findall(TEXT):
    matches.append(groups[0])
    
if len(matches) > 0:
    pyperclip.copy('\n'.join(matches))
    print('copied to clipboard')
    print(pyperclip.paste())
else:
    print('No phone numbers or emails found.')

Answer 1

不确定我是否理解正确。这似乎对我有用：

import re

phoneRegex = re.compile(r'''
    ,[\s]?
    ((\(?(\d{3})\)?)?
    (\s|-|\.)?
    (\d{3})
    (\s|-|\.)
    (\d{4})
    (\s*(ext|x|ext.)\s*(\d{2,5}))?
    )''',re.VERBOSE)

s = '''Joe Schmoe, CEO, 888.888.8888 Voicemail'''

result = re.search(phoneRegex,s)

if result: print(result[1]) # output -----> 888.888.8888

s = '''Joe Schmoe, CEO 888.888.8888 Voicemail'''

result = re.search(phoneRegex,s)

if result: print(result[1]) # output -----> nothing

Answer 2

哦。主要问题是您要检测 phone 号码立即转到逗号 ', 123456789'。您可以使用 https://regex101.com/ 来更好地检查正则表达式。在你的情况下，我建议你使用 look ahead，例如

import re

regex = r"(?<=\,\s)\d{3}(?P<conj>[\.\-\,])\d{3}(?P=conj)\d{4}"

test_str = ("Joe Schmoe, CEO, 888.888.8888\n"
    "Voicemail")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    
    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
    
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1
        
        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

Answer 3

问题是 (, ) 中的 space 被忽略了，因为您使用的是 re.VERBOSE 标志。

如果您实际上不需要捕获逗号，则可以改用 (?:, )。详细模式不会忽略 (?:...).

中的 whitespace

您也可以转义 space \ 使其成为文字，使用 \s 匹配任何类型的白色 space，或使用字符 class [ ].

Answer 4

由于 re 不支持可变宽度后视并且您正在使用 VERBOSE 标志，您应该替换：

(, )

和

(?:(?<=,)|(?<=,\ ))

用简单的英语：

Check if behind me is a comma or comma space. (?<=,\ ?) would throw an error.

正则表达式不识别逗号

regex not recognizing comma

python

regex

comma