如何使用 phonenumbers Python 库获取 df 每一行中的所有 phone 数字?

How to obtain all the phone numbers in each row of a df, using phonenumbers Python Library?

我想使用 Python 的 phonenumber 库创建一个列,其中包含数据框中 text 列的每一行中可用的所有有效 phone 数字.

complains = ['If you validate your data, your confirmation number is 1-23-456-789, for a teacher you will be debited on the 3rd of each month 41.99, you will pay for the remaining 3 services offered:n/a',
             'EMAIL VERIFYED, 12345 1st STUDENT 400 88888 2nd STUDENT 166.93 Your request has been submitted and your confirmation number is 1-234-567-777 speed is increased to 250MB .99 BILLING CYCLE 18',
             'ADJUSTMENT FROM NOVEMBER TO MAY .99 Appointment for equipment change 7878940142']

complainsdf = pd.DataFrame(complains, index =['1', '2', '3'], columns =['text'])

我尝试了下面的代码。但是我没有得到我期望的结果。

complainsdf['tel'] = complainsdf.apply(lambda row: 
    phonenumbers.PhoneNumberMatcher(row['text'], "US"), axis=1)

complainsdf['tel'][0] 给我以下输出: <phonenumbers.phonenumbermatcher.PhoneNumberMatcher at 0x2623ebfddf0> 而不是预期的 phone 数字。

tel 每行可以包含多个 phone 数字。它们存储为 phonenumbers.PhoneNumberMatcher.

类型的对象

要提取原始 phone 数字,您必须使用循环遍历对象。例如,您可以这样做:

def get_phone_numbers(x):
    # Extract the phone numbers from the text
    nums = phonenumbers.PhoneNumberMatcher(x, "US")
    # Convert the phone number format
    return [phonenumbers.format_number(num.number, phonenumbers.PhoneNumberFormat.E164) for num in nums]

complainsdf['tel'] = complainsdf['text'].apply(get_phone_numbers)
complainsdf

                                                 text   tel
1   If you validate your data, your confirmation n...   []
2   EMAIL VERIFYED, 12345 1st STUDENT 400 88888 2n...   []
3   ADJUSTMENT FROM NOVEMBER TO MAY .99 Appoint...   [+17878940142]

我在documentation中找到了用PhoneNumberFormat.E164转换格式的方法。也许你需要根据你的情况进行调整。