如何使用 phonenumbers Python 库获取 df 每一行中的所有 phone 数字?
How to obtain all the phone numbers in each row of a df, using phonenumbers Python Library?
我想使用 Python 的 phonenumber
库创建一个列,其中包含数据框中 text
列的每一行中可用的所有有效 phone 数字.
complains = ['If you validate your data, your confirmation number is 1-23-456-789, for a teacher you will be debited on the 3rd of each month 41.99, you will pay for the remaining 3 services offered:n/a',
'EMAIL VERIFYED, 12345 1st STUDENT 400 88888 2nd STUDENT 166.93 Your request has been submitted and your confirmation number is 1-234-567-777 speed is increased to 250MB .99 BILLING CYCLE 18',
'ADJUSTMENT FROM NOVEMBER TO MAY .99 Appointment for equipment change 7878940142']
complainsdf = pd.DataFrame(complains, index =['1', '2', '3'], columns =['text'])
我尝试了下面的代码。但是我没有得到我期望的结果。
complainsdf['tel'] = complainsdf.apply(lambda row:
phonenumbers.PhoneNumberMatcher(row['text'], "US"), axis=1)
complainsdf['tel'][0]
给我以下输出:
<phonenumbers.phonenumbermatcher.PhoneNumberMatcher at 0x2623ebfddf0>
而不是预期的 phone 数字。
列 tel
每行可以包含多个 phone 数字。它们存储为 phonenumbers.PhoneNumberMatcher
.
类型的对象
要提取原始 phone 数字,您必须使用循环遍历对象。例如,您可以这样做:
def get_phone_numbers(x):
# Extract the phone numbers from the text
nums = phonenumbers.PhoneNumberMatcher(x, "US")
# Convert the phone number format
return [phonenumbers.format_number(num.number, phonenumbers.PhoneNumberFormat.E164) for num in nums]
complainsdf['tel'] = complainsdf['text'].apply(get_phone_numbers)
complainsdf
text tel
1 If you validate your data, your confirmation n... []
2 EMAIL VERIFYED, 12345 1st STUDENT 400 88888 2n... []
3 ADJUSTMENT FROM NOVEMBER TO MAY .99 Appoint... [+17878940142]
我在documentation中找到了用PhoneNumberFormat.E164
转换格式的方法。也许你需要根据你的情况进行调整。
我想使用 Python 的 phonenumber
库创建一个列,其中包含数据框中 text
列的每一行中可用的所有有效 phone 数字.
complains = ['If you validate your data, your confirmation number is 1-23-456-789, for a teacher you will be debited on the 3rd of each month 41.99, you will pay for the remaining 3 services offered:n/a',
'EMAIL VERIFYED, 12345 1st STUDENT 400 88888 2nd STUDENT 166.93 Your request has been submitted and your confirmation number is 1-234-567-777 speed is increased to 250MB .99 BILLING CYCLE 18',
'ADJUSTMENT FROM NOVEMBER TO MAY .99 Appointment for equipment change 7878940142']
complainsdf = pd.DataFrame(complains, index =['1', '2', '3'], columns =['text'])
我尝试了下面的代码。但是我没有得到我期望的结果。
complainsdf['tel'] = complainsdf.apply(lambda row:
phonenumbers.PhoneNumberMatcher(row['text'], "US"), axis=1)
complainsdf['tel'][0]
给我以下输出:
<phonenumbers.phonenumbermatcher.PhoneNumberMatcher at 0x2623ebfddf0>
而不是预期的 phone 数字。
列 tel
每行可以包含多个 phone 数字。它们存储为 phonenumbers.PhoneNumberMatcher
.
要提取原始 phone 数字,您必须使用循环遍历对象。例如,您可以这样做:
def get_phone_numbers(x):
# Extract the phone numbers from the text
nums = phonenumbers.PhoneNumberMatcher(x, "US")
# Convert the phone number format
return [phonenumbers.format_number(num.number, phonenumbers.PhoneNumberFormat.E164) for num in nums]
complainsdf['tel'] = complainsdf['text'].apply(get_phone_numbers)
complainsdf
text tel
1 If you validate your data, your confirmation n... []
2 EMAIL VERIFYED, 12345 1st STUDENT 400 88888 2n... []
3 ADJUSTMENT FROM NOVEMBER TO MAY .99 Appoint... [+17878940142]
我在documentation中找到了用PhoneNumberFormat.E164
转换格式的方法。也许你需要根据你的情况进行调整。