识别真phone号

Identifying genuine phone number

我有一个数据集,其中有一个专门用于捕获 phone 数字的列。我的任务是验证相同的内容,因为存在错误的条目,例如“9999999999”、“0123456789”和许多其他类似性质的条目。 我想通过识别运营商名称来解决这个问题,所以上面的实例很容易被忽略,因为不会有任何运营商名称。 我遇到了一个名为 phonenumbers 的包,并使用了下面的代码

import phonenumbers
from phonenumbers import carrier
ro_number = phonenumbers.parse("+91xxxxxxxxxx") # number is redacted purposely
carrier.name_for_number(ro_number, "en")

输出为'BSNL MOBILE' 我想 运行 在数据框的整个列上这样做,在其中创建一个新列并记录每个数字运营商名称。

我尝试使用 for 循环,

for i in df['phone_number']:
    ro_number = phonenumbers.parse(i)
    carrier.name_for_number(ro_number, "en")

但是出现以下错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-80-af01b9d8c9ef> in <module>
      1 for i in merged_Data['SELLER_NUMBER']:
----> 2     ro_number = phonenumbers.parse(i)
      3     carrier.name_for_number(ro_number, "en")

~\anaconda3\lib\site-packages\phonenumbers\phonenumberutil.py in parse(number, region, keep_raw_input, numobj, _check_region)
   2834         raise NumberParseException(NumberParseException.NOT_A_NUMBER,
   2835                                    "The phone number supplied was None.")
-> 2836     elif len(number) > _MAX_INPUT_STRING_LENGTH:
   2837         raise NumberParseException(NumberParseException.TOO_LONG,
   2838                                    "The string supplied was too long to parse.")

TypeError: object of type 'int' has no len()

不确定这是否是遍历整个列的正确方法。 将不胜感激。

TypeError: object of type 'int' has no len()

该错误表明您正在尝试对 int 调用 len()。您应该先转换为字符串:

len(str(x))

制作了两个代码模组:

  1. 使用方法is_valid_number检查号码是否在交易所
  2. 指定区域(例如“美国”),因为使用 None 不适用于测试用例“18004444444”,这是一个 MCI phone 测试编号。

代码

import phonenumbers
from phonenumbers import carrier

def valid_number(number, region = "US"):
    ''' check validity of phone numbers (default to US region)
        
        Used default region as US since some numbers did not work using None
    '''
    # Parsing String to Phone number
    phone_number = phonenumbers.parse(number, region)
  
    # Validating a phone number (i.e. it's in an assigned exchange)
    return phonenumbers.is_valid_number(phone_number)

使用列表测试

data = ["+442083661177", "+123456789", "18004444444"]

for i in data:
    print(i, valid_number(i))

# Output
+442083661177 True
+123456789 False
18004444444 True    # note: this number doesn't work with default region = None

使用 DataFrame 测试

df = pd.DataFrame({"phone_number": data})
df['valid'] = df['phone_number'].apply(valid_number)
# Resulting df
    phone_number    valid
0   +442083661177   True
1   +123456789  False
2   18004444444 True