Webscrape - 获取 link/href

Question

我正在尝试进入网页并获取每一行的 href/link。

目前，代码只打印空白。

预期输出是打印网页中每一行的 href/link。

import requests
from bs4 import BeautifulSoup

url = 'https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/286/program-guide/search?q=&pageNumber=1&size=20'

baseurl='https://ash.confex.com/ash/2021/webprogram/'

res = requests.get(url)
soup = BeautifulSoup(res.content,'html.parser')


productlist = soup.find_all('div',class_='session-card')

for b in productlist:
    links = b["href"]
    print(links)

Answer 1

会发生什么？

首先仔细看看你的汤，你不会找到你要搜索的信息，因为你会被屏蔽。

您的 selection find_all('div',class_='session-card') 中的元素也没有直接属性 href。

如何修复？

在您的请求中添加一些 headers：

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
res = requests.get(url, headers=headers)

Select 另外 <a> 在你的迭代中 select 链接并获得 href:

b.a["href"]

例子

import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
url = 'https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/286/program-guide/search?q=&pageNumber=1&size=20'

baseurl='https://ash.confex.com/ash/2021/webprogram/'

res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.content,'html.parser')

for b in soup.find_all('div',class_='session-card'):
    links = b.a["href"]
    print(links)

Webscrape - 获取 link/href

Webscrape - Getting link/href

selenium

beautifulsoup

request

web-scraping

会发生什么？

如何修复？

例子