抓取属性。 BeautifulSoup 锚点内的文本

Scrape Attrs. text within anchor by BeautifulSoup

我正在尝试抓取锚元素内的数据。

我试过了,但没用。

import requests
from bs4 import BeautifulSoup as bs

url = 'https://example.com'

response= requests.get(url)
soup = bs(response.content, 'html.parser')
itemstr= soup.find('table',{'id':'listtable'})
for anc in itemstr:
    f= anc.find_all('a')
    print(f)

谢谢

数据通过JavaScript动态加载。您可以使用 requests 模块来获取信息。

例如:

import json
import requests


page = 1
search_link = 'https://www.*********/GetDrugs.php?page={page}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

data = requests.get(search_link.format(page=page), headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

# print some data to screen:
print('Page {}/{}'.format(data['currentPage'], data['pageCount']))
for r in data['results']:
    print('{:<10} {:<10} {:<40} {:<40} {}'.format(r['id'], r['registerNumber'], r['tradeName'], r['scientificName'], r['agent']))

打印:

Page 1/714
6912       3-5286-19  ATOXIA 120 mg Film-coated Tablet         ETORICOXIB                               SAUDI INTERNATIONAL TRADING COMPANY LTD (SITCO)
7162       27-271-17  EPIVAL 200MGML SYRUP                   VALPROATE SODIUM                         Dallah Health Care Company
5688       43-271-19  SENERGY 10 MG/160 MG F.C. TABLET         AMLODIPINE ,   VALSARTAN                 SAJA-SAUDI ARABIAN JAPANESE PHARMACEUTICAL CO
8341       33-271-18  LEROXO 8 MG FILM COATED TABLET           LORNOXICAM                               Alkamal Import Office
8812       1-939-18   FEFOL SPANSULES                          FERROUS SULFATE, FOLIC ACID              TABUK PHARMACEUTICAL MANUFACTURING CO.
2531       4-271-98   CLODEARM 0.05% OINTMENT                  CLOBETASOL PROPIONATE                    ALNAGHI COMPANY
2532       5-271-98   CLODEARM 0.05% CREAM                     CLOBETASOL PROPIONATE                    ALNAGHI COMPANY
4531       1-271-96   DICLOFEN 1% CREMOGEL                     DICLOFENAC SODIUM                        SALEHIYA TRADING EST.
321        18-271-03  PROFILAR 1MG/5ML SYRUP                   KETOTIFEN                                SALEHIYA TRADING EST.
1268       13-271-01  UNIFED SYRUP                             TRIPROLIDINE, PSEUDOEPHEDRINE            SALEHIYA TRADING EST.

编辑:打印第 1 到 99 页:

for page in range(1, 100):
    print('Page', page)
    search_link = 'https://**********/GetDrugs.php?page={page}'
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

    data = requests.get(search_link.format(page=page), headers=headers).json()

    # uncomment this to print all data:
    # print(json.dumps(data, indent=4))

    # print some data to screen:
    print('Page {}/{}'.format(data['currentPage'], data['pageCount']))
    for r in data['results']:
        print('{:<10} {:<10} {:<40} {:<40} {}'.format(r['id'], r['registerNumber'], r['tradeName'], r['scientificName'], r['agent'] or '-'))