不根据 python 中的空格拆分条目的唯一列表

Question

这里是CSS我要报废

<a id="phone-lead" class="callseller-description-link" rel="050 395 7996" href="#">Show Phone Number</a>

目的：

获取 phone 数字在 CSS 中的任何位置。（注意，这种类型的 phone 号码有多个实例，所以我需要提取所有并将其保存在列表中）

这是我正在使用的：

phone_result=[]
try:
    phone_result = soup.find('a', {'id': 'phone-lead', 'rel':True}).get('rel')
    for a in soup.find_all('a', {'id':'phone-lead', 'rel': True}):
        phone_result+=(a['rel'])
    phone_result=str(phone_result)
    print phone_result

    except StandardError as e:
        phone_result="Error was {0}".format(e)
        print phone_result

问题：

1) 它不提供独特的输出。我试图将字符串转换为集合，但它搞砸了
2）它考虑空格并将它们视为列表的不同条目

输出示例：

['050', '395', '7996', '050', '395', '7996', '04', '551', '9485', '050', '395', '7996', '050', '395', '7996', '04', '551', '9485', '04', '551', '9485', '050', '395', '7996', '050', '395', '7996', '04']

如何修复它以获得类似

的内容

[0503957996, 045519485]

通过此处的帮助解决方案：

phone_result=[]
try:
    # phone_result=  soup.find('a', {'id': 'phone-lead', 'rel': True}).get('rel') (REMOVED)
    for a in soup.find_all('a', {'id':'phone-lead', 'rel': True}):
        phone_result.append(','.join(a['rel']))
    phone_result=str(phone_result)

    print phone_result



except StandardError as e:
    phone_result="Error was {0}".format(e)
    print phone_result

问题： 我的输出是这样的

['055,442,4433','055,334,3342']

我想我需要trim这个数字？

Answer 1

似乎 a['rel'] return 是一个类似于 ['050', '395', '7996'] 的列表。所以在你的 for 循环中你可以做类似的事情：

phone_result.append(''.join(a['rel']))

请注意 list.append 添加一个元素到列表的末尾（而不是 return 任何东西）而 + 合并两个列表

另外，删除循环之前的第一个 soup.find('a',... 否则你会得到它两次。

Answer 2

我不知道这个库，但您好像创建了几次 phone_result 列表。

phone_result = [] #  creating phone_result list  
try:
    phone_result = soup.find('a', {'id': 'phone-lead', 'rel':True}).get('rel') # dont know if this creates a list but phone_result is declared again
for a in soup.find_all('a', {'id':'phone-lead', 'rel': True}): # doesn't look right considering the above  
    phone_result += (a['rel']) #  this takes the existing list and adds a['rel] to it 
phone_result = str(phone_result)
print phone_result

获得正确的 phone 号码列表后，您可以对其调用 set 以获取唯一值

Answer 3

你误解了我的意思。您不需要同时使用 find 和 find_all。如果您想检索与您的过滤器匹配的所有后代，只需使用 find_all。而且，正如我所说，您需要使用 str.join 来加入结果。

from bs4 import BeautifulSoup

html = """<a id="phone-lead" class="callseller-description-link" 
     rel="050 395 7996" href="#">Show Phone Number</a>"""
soup = BeautifulSoup(html, "html.parser")
phone_result = [','.join(map(str.strip, a.get('rel'))) for a in soup.find_all('a', {'id':'phone-lead', 'rel': True})]
print phone_result
#  ['050,395,7996']

不根据 python 中的空格拆分条目的唯一列表

Unique list without spliting entries based on spaces in python

python

beautifulsoup

python-2.x

bs4