抓取属性。 BeautifulSoup 锚点内的文本
Scrape Attrs. text within anchor by BeautifulSoup
我正在尝试抓取锚元素内的数据。
我试过了,但没用。
import requests
from bs4 import BeautifulSoup as bs
url = 'https://example.com'
response= requests.get(url)
soup = bs(response.content, 'html.parser')
itemstr= soup.find('table',{'id':'listtable'})
for anc in itemstr:
f= anc.find_all('a')
print(f)
谢谢
数据通过JavaScript动态加载。您可以使用 requests
模块来获取信息。
例如:
import json
import requests
page = 1
search_link = 'https://www.*********/GetDrugs.php?page={page}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
data = requests.get(search_link.format(page=page), headers=headers).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
# print some data to screen:
print('Page {}/{}'.format(data['currentPage'], data['pageCount']))
for r in data['results']:
print('{:<10} {:<10} {:<40} {:<40} {}'.format(r['id'], r['registerNumber'], r['tradeName'], r['scientificName'], r['agent']))
打印:
Page 1/714
6912 3-5286-19 ATOXIA 120 mg Film-coated Tablet ETORICOXIB SAUDI INTERNATIONAL TRADING COMPANY LTD (SITCO)
7162 27-271-17 EPIVAL 200MGML SYRUP VALPROATE SODIUM Dallah Health Care Company
5688 43-271-19 SENERGY 10 MG/160 MG F.C. TABLET AMLODIPINE , VALSARTAN SAJA-SAUDI ARABIAN JAPANESE PHARMACEUTICAL CO
8341 33-271-18 LEROXO 8 MG FILM COATED TABLET LORNOXICAM Alkamal Import Office
8812 1-939-18 FEFOL SPANSULES FERROUS SULFATE, FOLIC ACID TABUK PHARMACEUTICAL MANUFACTURING CO.
2531 4-271-98 CLODEARM 0.05% OINTMENT CLOBETASOL PROPIONATE ALNAGHI COMPANY
2532 5-271-98 CLODEARM 0.05% CREAM CLOBETASOL PROPIONATE ALNAGHI COMPANY
4531 1-271-96 DICLOFEN 1% CREMOGEL DICLOFENAC SODIUM SALEHIYA TRADING EST.
321 18-271-03 PROFILAR 1MG/5ML SYRUP KETOTIFEN SALEHIYA TRADING EST.
1268 13-271-01 UNIFED SYRUP TRIPROLIDINE, PSEUDOEPHEDRINE SALEHIYA TRADING EST.
编辑:打印第 1 到 99 页:
for page in range(1, 100):
print('Page', page)
search_link = 'https://**********/GetDrugs.php?page={page}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
data = requests.get(search_link.format(page=page), headers=headers).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
# print some data to screen:
print('Page {}/{}'.format(data['currentPage'], data['pageCount']))
for r in data['results']:
print('{:<10} {:<10} {:<40} {:<40} {}'.format(r['id'], r['registerNumber'], r['tradeName'], r['scientificName'], r['agent'] or '-'))
我正在尝试抓取锚元素内的数据。
我试过了,但没用。
import requests
from bs4 import BeautifulSoup as bs
url = 'https://example.com'
response= requests.get(url)
soup = bs(response.content, 'html.parser')
itemstr= soup.find('table',{'id':'listtable'})
for anc in itemstr:
f= anc.find_all('a')
print(f)
谢谢
数据通过JavaScript动态加载。您可以使用 requests
模块来获取信息。
例如:
import json
import requests
page = 1
search_link = 'https://www.*********/GetDrugs.php?page={page}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
data = requests.get(search_link.format(page=page), headers=headers).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
# print some data to screen:
print('Page {}/{}'.format(data['currentPage'], data['pageCount']))
for r in data['results']:
print('{:<10} {:<10} {:<40} {:<40} {}'.format(r['id'], r['registerNumber'], r['tradeName'], r['scientificName'], r['agent']))
打印:
Page 1/714
6912 3-5286-19 ATOXIA 120 mg Film-coated Tablet ETORICOXIB SAUDI INTERNATIONAL TRADING COMPANY LTD (SITCO)
7162 27-271-17 EPIVAL 200MGML SYRUP VALPROATE SODIUM Dallah Health Care Company
5688 43-271-19 SENERGY 10 MG/160 MG F.C. TABLET AMLODIPINE , VALSARTAN SAJA-SAUDI ARABIAN JAPANESE PHARMACEUTICAL CO
8341 33-271-18 LEROXO 8 MG FILM COATED TABLET LORNOXICAM Alkamal Import Office
8812 1-939-18 FEFOL SPANSULES FERROUS SULFATE, FOLIC ACID TABUK PHARMACEUTICAL MANUFACTURING CO.
2531 4-271-98 CLODEARM 0.05% OINTMENT CLOBETASOL PROPIONATE ALNAGHI COMPANY
2532 5-271-98 CLODEARM 0.05% CREAM CLOBETASOL PROPIONATE ALNAGHI COMPANY
4531 1-271-96 DICLOFEN 1% CREMOGEL DICLOFENAC SODIUM SALEHIYA TRADING EST.
321 18-271-03 PROFILAR 1MG/5ML SYRUP KETOTIFEN SALEHIYA TRADING EST.
1268 13-271-01 UNIFED SYRUP TRIPROLIDINE, PSEUDOEPHEDRINE SALEHIYA TRADING EST.
编辑:打印第 1 到 99 页:
for page in range(1, 100):
print('Page', page)
search_link = 'https://**********/GetDrugs.php?page={page}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
data = requests.get(search_link.format(page=page), headers=headers).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
# print some data to screen:
print('Page {}/{}'.format(data['currentPage'], data['pageCount']))
for r in data['results']:
print('{:<10} {:<10} {:<40} {:<40} {}'.format(r['id'], r['registerNumber'], r['tradeName'], r['scientificName'], r['agent'] or '-'))