Multiple regex patterns for input data: TypeError: can only concatenate str (not "NoneType") to str

Multiple regex patterns for input data: TypeError: can only concatenate str (not "NoneType") to str

Python 3.

我试图将所有可能的用于识别电话号码的正则表达式模式包含到一个变量中。我用管道将它们分开。

我在遍历我的输入数据结构时收到 TypeError 代码:在这种情况下,names:phone 个数字的字典

import re

text = {'Forest': '123-456-7890', 'Johanna': '(987) 654-4321', 'Mom': '555.555.5555', 'Camille':'9988776655'}

regexPat = r'(\d{3})-(\d{3}-\d{4})|(\(\d{3}\)) (\d{3}-\d{4})|(\d{3})\.(\d{3}\.\d{4}|(\d{3})(\d{7}))'

print("Using 'pipes' to separate possible regex patterns")

phNum = re.compile(regexPat)

for k in text:
        mo = phNum.search(text[k])
        print(k+'\'s area code: '+ mo.group(1))
        print('Suffix: ' + mo.group(2), end=' Whole Number: ')
        print(mo.groups())

结果/错误:

Using 'pipes' to separate possible regex patterns
Forest's area code: 123
Suffix: 456-7890 Whole Number: ('123', '456-7890', None, None, None, None, None, None)
Traceback (most recent call last):
File "z:\documents\programming\mypythonscripts\isphonenumber.py", line 16, in
print(k+''s area code: '+ mo.group(1))
TypeError: can only concatenate str (not "NoneType") to str
>

根据失败前的打印语句,我认为正在发生的事情是正则表达式模式未找到任何匹配项,因此它们作为 NoneType 数据返回给组。

是否有解决此类问题的方法?我应该查看可选匹配吗?

我想你已经明白为什么它不起作用了。您有 8 个捕获组,对于 'Forest' 模式与第 1 组和第 2 组匹配,这就是您的代码在第 1 组和第 2 组 'Johanna' return [=18 的第二次迭代中起作用的原因=],因此第 3 组和第 4 组匹配该模式。此时代码失败。

正如@Wiktor 所建议的,只需稍加改动并采用相同的方法,您就可以使用 link 的解决方案。我有一些不同的解决方案,您只搜索 3 个组(1 个用于前缀,2,3 个用于后缀),如下所示:

text = {'Forest': '123-456-7890', 'Johanna': '(987) 654-4321', 'Mom': '555.555.5555', 'Camille':'9988776655'}
pattern = r"^\(?(\d{3})(?:\-|\)\s|\.|)?(\d{3}(\-|\.|)?\d{4})$"
num = re.compile(pattern)
for key,value in text.items():
    mo = num.search(value)
    prefix = mo.group(1)
    suffix = ''.join((x for x in mo.group(2) if x.isdigit()))
    #suffix = ''.join((x for x in mo.group(2) if not x in mo.group(3))) #works aswell
    print(key+'\'s area code: '+ prefix)
    print('Suffix: ', suffix, end=' Whole Number: ')
    print(prefix+suffix)

# Output:
Forest's area code: 123
Suffix:  4567890 Whole Number: 1234567890
Johanna's area code: 987
Suffix:  6544321 Whole Number: 9876544321
Mom's area code: 555
Suffix:  5555555 Whole Number: 5555555555
Camille's area code: 998
Suffix:  8776655 Whole Number: 9988776655

我建议使用不带任何组的模式以使其更简单,并且一旦匹配,删除 non-digit 字符并仅通过切片获得您想要的部分:

import re
 
text = {'Forest': '123-456-7890', 'Johanna': '(987) 654-4321', 'Mom': '555.555.5555', 'Camille':'9988776655'}
 
regexPat = r'^(?:\d{3}-\d{3}-\d{4}|\(\d{3}\) \d{3}-\d{4}|\d{3}\.\d{3}\.\d{4}|\d{10})$'
 
print("Using 'pipes' to separate possible regex patterns")
 
phNum = re.compile(regexPat)
 
for k in text:
        mo = phNum.search(text[k])
        if mo:
            phone_num_text = "".join(c for c in mo.group() if c.isdigit())
            print(f"{k}'s area code: {phone_num_text[:3]}")
            print(f'Suffix: {phone_num_text[3:]}')
            print(f'Whole Number: {phone_num_text}')

参见Python demo。输出:

Using 'pipes' to separate possible regex patterns
Forest's area code: 123
Suffix: 4567890
Whole Number: 1234567890
Johanna's area code: 987
Suffix: 6544321
Whole Number: 9876544321
Mom's area code: 555
Suffix: 5555555
Whole Number: 5555555555
Camille's area code: 998
Suffix: 8776655
Whole Number: 9988776655