识别真phone号
Identifying genuine phone number
我有一个数据集,其中有一个专门用于捕获 phone 数字的列。我的任务是验证相同的内容,因为存在错误的条目,例如“9999999999”、“0123456789”和许多其他类似性质的条目。
我想通过识别运营商名称来解决这个问题,所以上面的实例很容易被忽略,因为不会有任何运营商名称。
我遇到了一个名为 phonenumbers
的包,并使用了下面的代码
import phonenumbers
from phonenumbers import carrier
ro_number = phonenumbers.parse("+91xxxxxxxxxx") # number is redacted purposely
carrier.name_for_number(ro_number, "en")
输出为'BSNL MOBILE'
我想 运行 在数据框的整个列上这样做,在其中创建一个新列并记录每个数字运营商名称。
我尝试使用 for
循环,
for i in df['phone_number']:
ro_number = phonenumbers.parse(i)
carrier.name_for_number(ro_number, "en")
但是出现以下错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-80-af01b9d8c9ef> in <module>
1 for i in merged_Data['SELLER_NUMBER']:
----> 2 ro_number = phonenumbers.parse(i)
3 carrier.name_for_number(ro_number, "en")
~\anaconda3\lib\site-packages\phonenumbers\phonenumberutil.py in parse(number, region, keep_raw_input, numobj, _check_region)
2834 raise NumberParseException(NumberParseException.NOT_A_NUMBER,
2835 "The phone number supplied was None.")
-> 2836 elif len(number) > _MAX_INPUT_STRING_LENGTH:
2837 raise NumberParseException(NumberParseException.TOO_LONG,
2838 "The string supplied was too long to parse.")
TypeError: object of type 'int' has no len()
不确定这是否是遍历整个列的正确方法。
将不胜感激。
TypeError: object of type 'int' has no len()
该错误表明您正在尝试对 int 调用 len()。您应该先转换为字符串:
len(str(x))
制作了两个代码模组:
- 使用方法is_valid_number检查号码是否在交易所
- 指定区域(例如“美国”),因为使用 None 不适用于测试用例“18004444444”,这是一个 MCI phone 测试编号。
代码
import phonenumbers
from phonenumbers import carrier
def valid_number(number, region = "US"):
''' check validity of phone numbers (default to US region)
Used default region as US since some numbers did not work using None
'''
# Parsing String to Phone number
phone_number = phonenumbers.parse(number, region)
# Validating a phone number (i.e. it's in an assigned exchange)
return phonenumbers.is_valid_number(phone_number)
使用列表测试
data = ["+442083661177", "+123456789", "18004444444"]
for i in data:
print(i, valid_number(i))
# Output
+442083661177 True
+123456789 False
18004444444 True # note: this number doesn't work with default region = None
使用 DataFrame 测试
df = pd.DataFrame({"phone_number": data})
df['valid'] = df['phone_number'].apply(valid_number)
# Resulting df
phone_number valid
0 +442083661177 True
1 +123456789 False
2 18004444444 True
我有一个数据集,其中有一个专门用于捕获 phone 数字的列。我的任务是验证相同的内容,因为存在错误的条目,例如“9999999999”、“0123456789”和许多其他类似性质的条目。
我想通过识别运营商名称来解决这个问题,所以上面的实例很容易被忽略,因为不会有任何运营商名称。
我遇到了一个名为 phonenumbers
的包,并使用了下面的代码
import phonenumbers
from phonenumbers import carrier
ro_number = phonenumbers.parse("+91xxxxxxxxxx") # number is redacted purposely
carrier.name_for_number(ro_number, "en")
输出为'BSNL MOBILE'
我想 运行 在数据框的整个列上这样做,在其中创建一个新列并记录每个数字运营商名称。
我尝试使用 for
循环,
for i in df['phone_number']:
ro_number = phonenumbers.parse(i)
carrier.name_for_number(ro_number, "en")
但是出现以下错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-80-af01b9d8c9ef> in <module>
1 for i in merged_Data['SELLER_NUMBER']:
----> 2 ro_number = phonenumbers.parse(i)
3 carrier.name_for_number(ro_number, "en")
~\anaconda3\lib\site-packages\phonenumbers\phonenumberutil.py in parse(number, region, keep_raw_input, numobj, _check_region)
2834 raise NumberParseException(NumberParseException.NOT_A_NUMBER,
2835 "The phone number supplied was None.")
-> 2836 elif len(number) > _MAX_INPUT_STRING_LENGTH:
2837 raise NumberParseException(NumberParseException.TOO_LONG,
2838 "The string supplied was too long to parse.")
TypeError: object of type 'int' has no len()
不确定这是否是遍历整个列的正确方法。 将不胜感激。
TypeError: object of type 'int' has no len()
该错误表明您正在尝试对 int 调用 len()。您应该先转换为字符串:
len(str(x))
制作了两个代码模组:
- 使用方法is_valid_number检查号码是否在交易所
- 指定区域(例如“美国”),因为使用 None 不适用于测试用例“18004444444”,这是一个 MCI phone 测试编号。
代码
import phonenumbers
from phonenumbers import carrier
def valid_number(number, region = "US"):
''' check validity of phone numbers (default to US region)
Used default region as US since some numbers did not work using None
'''
# Parsing String to Phone number
phone_number = phonenumbers.parse(number, region)
# Validating a phone number (i.e. it's in an assigned exchange)
return phonenumbers.is_valid_number(phone_number)
使用列表测试
data = ["+442083661177", "+123456789", "18004444444"]
for i in data:
print(i, valid_number(i))
# Output
+442083661177 True
+123456789 False
18004444444 True # note: this number doesn't work with default region = None
使用 DataFrame 测试
df = pd.DataFrame({"phone_number": data})
df['valid'] = df['phone_number'].apply(valid_number)
# Resulting df
phone_number valid
0 +442083661177 True
1 +123456789 False
2 18004444444 True