Python BeautifulSoup 抓取;如何结合两个不同的字段,或根据站点中的位置将它们配对?

Python BeautifulSoup scraping; how to combine two different fields, or pair them based on location in site?

好的,伙计们,我是这里的初学者。我正在尝试做的目的是从网站上抓取公司名称和相应的 phone 号码。最终目标是将这些写入可以用 Excel 打开的 CSV 文件。

目前我可以分别检索公司名称和 phone 号码。我在想我可以以某种方式合并这两个列表,但我担心单个异常数据会抵消整个合并,并使数字与名称不匹配。

完成此任务的最佳方法是什么?

from urllib import request
from bs4 import BeautifulSoup

url = 'https://www.iqsdirectory.com/bolts/bolts-2/'
html = request.urlopen(url)
soup = BeautifulSoup(html, 'html.parser')

data1 = soup.findAll('span', {'itemprop':'name'})
data2 = soup.findAll('a', {'itemprop':'telephone'})

datalist1 = []
datalist2 = []

for i in data1:
    datalist1.append(i.string)

for i in data2:
    datalist2.append(i.string)

x = zip(datalist1, datalist2)

print(list(x))

是否可以在同一个 soup 函数中提取 name 和 phone 以保持它们之间的联系?

如有任何帮助,我们将不胜感激!

这是满足您需求的解决方案。如果名称或号码不存在,则不会出现在该列表中。可能有一个正确的异常要捕获,但我不知道正确的名称。

这个想法就像我在评论中解释的那样。我得到了 header 的列表。对于每个 header,我尝试找到名称和编号。如果我找不到它,我会捕获异常。如果我能找到它,我会将它附加到一家公司。然后对于每家公司,我将其附加到公司。我们的结果是一个公司列表,其中每个公司都是一个包含名称和编号的列表。

from urllib import request
from bs4 import BeautifulSoup

url = 'https://www.iqsdirectory.com/bolts/bolts-2/'
html = request.urlopen(url)
soup = BeautifulSoup(html, 'html.parser')

headers = soup.findAll('h3', {"class": 'cname'})
companies = []
for header in headers:
    company = []
    try:
        company.append(header.find('span', {'itemprop':'name'}).text)
    except Error as e:
        print(e)
        pass
    try:
        company.append(header.find('a', {'itemprop':'telephone'}).text)
    except Error as e:
        print(e)
        pass
    companies.append(company)
print(companies)

您的结果是:

[['A & J Fastener Corp.', '877-563-2658'], ['AA Anchor Bolt, Inc.', '800-929-3845'], ['Abbott-Interfast Corporation', '800-877-0789'], ['Accurate Manufactured Products Group, Inc.', '317-472-9000'], ['ACF Components & Fasteners, Inc.', '800-824-5449'], ['Aerospace Manufacturing Corporation', '973-472-2300'], ['Aetna Screw Products Co.', '847-647-9555'], ['AFT Fasteners', '877-844-8595'], ['AJ Fasteners Inc.', '714-630-1556'], ['All-Ways Fasteners, Inc.', '800-870-0372'], ['Amco Enterprises', '866-651-2626'], ['American Bolt Corp.', '262-786-6530'], ['Anchor Bolt & Screw Company', '847-841-7000'], ['Anchor Bolt Source', '888-812-6587'], ['Ancrabec', '888-649-7203'], ['Armour Screw Company', '800-726-4563'], ['Aspen Fasteners', '800-479-0056'], ['Assembly Products, Inc.', '608-296-1666'], ['Associated Fastening Products, Inc.', '888-696-0709'], ['Atwood Industries', '800-362-2059'], ['B&G Manufacturing', '800-366-3067'], ['Baco Enterprises, Inc.', '800-622-2226'], ['Barnhill Bolt Co., Inc.', '800-472-3900'], ['Birmingham Fastener Manufacturing', '800-695-3511'], ['Blue Ribbon Fastener Co.', '847-673-1248'], ['BMB Fasteners, Inc.', '973-256-4010'], ['Bolt Products, Inc.', '800-423-6503'], ['Bossard North America, Inc.', '800-772-2738'], ['Bowie Bolt & Supply, Inc.', '800-337-9650'], ['British Metrics', '800-762-5134'], ['Brunner Manufacturing Co., Inc.', '608-847-6667'], ['Buckeye Fasteners, Inc.', '800-437-1689'], ['C&L Rivet Company, Inc.', '215-672-1113'], ['Cal-Fasteners, Inc.', '714-854-1715'], ['California Bolt Co.', '714-957-6000'], ['Champion Bolt & Supply', '425-339-2632'], ['Chicago Hardware & Fixture Company', '847-455-6609'], ['Chicago Nut & Bolt', '888-529-8600'], ['Circle Bolt & Nut Co., Inc.', '800-548-2658'], ['Coburn-Myers Fastening Systems Incorporated', '800-662-7459'], ['Connor Fastener', '478-742-7261'], ['Cordova Bolt, Inc.', '800-421-3435'], ['DAN-LOC Bolt & Gasket', '800-231-6355'], ['Dayton Nut & Bolt Co., Inc.', '888-711-2658'], ['Deco Manufacturing Company', '800-637-5861'], ['Delta Fastener Corp.', '800-670-5938'], ['Diamond Fasteners', '877-729-6283'], ['Dyson Corporation', '800-680-3600'], ['E & T Fasteners', '800-650-4707'], ['East Coast Metals, Inc.', '800-355-2060'], ['Eastwood Manufacturing', '281-447-0081'], ['EBC Industries', '814-456-4287'], ['Elgin Equipment Group', '630-434-7200'], ['Elgin Fastener Group', '812-689-8990'], ['Engineered Components Company', '847-841-7000'], ['EPS Engineered Parts Sourcing Inc.', '877-889-1017'], ['Falcon Fastening Solutions', '502-266-6292'], ['FASCO, Inc.', '708-371-0747'], ['Fast-Rite International, Inc.', '888-327-8077'], ['Fastenal Company', '507-454-5374'], ['Fastener Dimensions, Inc.', '800-969-2188'], ['Fastener Solutions, Inc.', '866-463-2910'], ['Fastener SuperStore, Inc.', '866-688-2500'], ['Fastener Tool & Supply, Inc.', '800-662-9232'], ['Fasteners Plus International', '708-479-5558'], ['Fasteners Unlimited, Inc.', '724-776-7273'], ['Fastening Products of Lancaster, Inc.', '717-299-5771'], ['FM Stainless Fasteners', '800-749-1115'], ['Genesis Bolt & Supply', '866-276-1399'], ['Global Certified Fastener', '708-450-9301'], ['Global Fastener & Supply, Inc.', '800-785-2664'], ['Guidon Corporation', '856-866-8808'], ['Haydon Bolts, Inc.', '215-537-8700'], ['Hayes Bolt & Supply', '619-231-5966'], ['HC Pacific', '909-598-0509'], ['Hercules Fasteners', '800-332-7320'], ['Hudson Fasteners, Inc.', '877-427-2739'], ['Hydra-Dynamics, Inc.', '936-273-2882'], ['Infinity Fasteners', '913-438-2252'], ['IntegraTECH Distribution', '603-880-3760'], ['J.P. Ruklic Screw Company', '708-339-3600'], ['K-T Bolt Manufacturing, Inc.', '800-553-4521'], ['KelKo Products Company', '800-346-7883'], ['Kinter', '800-323-2389'], ['Lamons Fastener Division', '713-673-5376'], ['Lamons Gasket Company', '800-231-6906'], ['Larson Hardware Manufacturing Company', '815-625-0503'], ['Lincoln Structural Solutions', '402-952-4400'], ['Master Bolt Manufacturing, Inc.', '888-905-2658'], ['Melfast, Inc', '973-227-0045'], ['Micro Plastics, Inc', '(870)453-2261'], ['Mid-States Bolt & Screw Co.', '800-482-0867'], ['Mutual Screw & Supply', '800-222-0324'], ['National Bolt & Nut Corporation', '630-307-8800'], ['Nickel Systems, Inc.', '215-855-5633'], ['Nord-Lock / Superbolt®, Inc.', '412-279-1149'], ['Norwood Screw Machine Parts', '800-437-6644'], ['Nova Fasteners Co. Inc.', '877-541-7222'], ['O.E.M. Fastening Systems', '800-928-7439'], ['O.E.M. Hardware', '800-663-6554'], ['Ocean State Stainless, Inc.', '800-394-6396'], ['Palmer Bolt & Supply Co.', '(937)778-9606'], ['Parker Fasteners', '623-925-5998'], ['PennEngineering®', '800-342-5736'], ['Pohl Spring Works, Inc.', '800-777-1284'], ['Product Components Corporation', '800-336-0406'], ['Production Materials Inc.', '224-434-2290'], ['R&R Engineering Company Inc.', '800-979-1921'], ['Reco Industries', '636-639-6010'], ['Remco Bolt', '800-460-3327'], ['ROBNET', '410-247-7273'], ['SASCO Fasteners', '800-779-2024'], ['SC Fastening Systems, LLC.', '330-468-3300'], ['Screw Products International', '800-876-5153'], ['Secure Fastener & Tool Company', '201-939-4422'], ['Specialty Bolt & Screw, Inc.', '413-789-6700'], ['Specialty Screw Corporation', '815-969-4100'], ['St. Louis Screw & Bolt', '800-237-7059'], ['Stalcop', '765-436-7926'], ['Stanley Industries Inc.', '800-253-2658'], ['Stelfast® Inc.', '800-729-9779'], ['Suncor Stainless, Inc.', '800-394-2222'], ['Sunny Screw Industry Co. Ltd.', '770-351-2858'], ['Tanner Bolt & Nut Corp.', '800-456-2658'], ['Tengco', '714-676-8200'], ['The Federal Group', '800-759-2658'], ['Tripac', '951-280-4488'], ['TSA Manufacturing', '800-228-2948'], ['United Titanium, Inc.', '844-321-4684'], ['USP Aerospace Solutions, Inc.', '631-287-6321'], ['Valtra, Inc.', '800-989-5244'], ['Wayne Bolt & Nut Company', '800-521-2207'], ['WINK Fasteners, Inc.', '804-966-8111'], ['Wodin, Inc.', '440-439-4222'], ['Wurth Industry', '800-428-4686'], ['Yangtze Railroad Materials', '855-889-2648']]
import requests
from bs4 import BeautifulSoup
import csv


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')
    target = soup.select("h3.cname")
    with open("data.csv", 'w', newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["Name", "Phone"])
        for tar in target:
            name = tar.find("span", itemprop="name").text
            phone = tar.find("a", itemprop="telephone").text
            writer.writerow([name, phone])


main("https://www.iqsdirectory.com/bolts/bolts-2/")

输出:view-online