Python 抓取网址类别
Python scraping urls category
你好,我现在开始编程,我在使用 Scraping 时遇到了一些问题,我正在尝试获取具有多个名称的类别中的所有链接,但我没有隔离单元格,因为那里有很多同名的,有人可以帮助我吗?我会把我的代码的图片和我想要得到的url
`
url = 'http://books.toscrape.com/index.html'
reqs = requests.get(url)
if reqs.ok:
soup = BeautifulSoup(reqs.text, 'html.parser')
ul = soup.find('ul', {'class': 'nav nav-list'})
for cells in ul:
a = cells.find('a')
link = a['href']
#print(link)
[print(str(lis) + '\n\n') for lis in link]
=== LINK 图片 ===
我需要检索 (li)
中的所有 url
我想这就是你想要的。我对代码进行了评论,以解释我所做的事情以及原因。你显然可以用更少的行来写这个,但这解释起来更容易一些。
from bs4 import BeautifulSoup
import requests
url = 'http://books.toscrape.com/index.html'
reqs = requests.get(url)
if reqs.ok:
soup = BeautifulSoup(reqs.text, 'html.parser')
# use multiple class selector list
sidebar = soup.find('ul', {"class": ["nav", "nav-list"]})
# find all the list tags within the ol
li = sidebar.find_all('li')
for item in li:
# iterate through results to find a link
link = item.find('a', href=True)
# if there is a link print it
if link is not None:
print(link['href'])
这可以通过直接转到链接来完成:
from bs4 import BeautifulSoup
import requests
url = 'http://books.toscrape.com/index.html'
reqs = requests.get(url)
if reqs.ok:
soup = BeautifulSoup(reqs.text, 'html.parser')
ul = soup.find_all('a')
for cells in ul:
print(cells['href'])
你好,我现在开始编程,我在使用 Scraping 时遇到了一些问题,我正在尝试获取具有多个名称的类别中的所有链接,但我没有隔离单元格,因为那里有很多同名的,有人可以帮助我吗?我会把我的代码的图片和我想要得到的url
`
url = 'http://books.toscrape.com/index.html'
reqs = requests.get(url)
if reqs.ok:
soup = BeautifulSoup(reqs.text, 'html.parser')
ul = soup.find('ul', {'class': 'nav nav-list'})
for cells in ul:
a = cells.find('a')
link = a['href']
#print(link)
[print(str(lis) + '\n\n') for lis in link]
=== LINK 图片 ===
我需要检索 (li)
中的所有 url我想这就是你想要的。我对代码进行了评论,以解释我所做的事情以及原因。你显然可以用更少的行来写这个,但这解释起来更容易一些。
from bs4 import BeautifulSoup
import requests
url = 'http://books.toscrape.com/index.html'
reqs = requests.get(url)
if reqs.ok:
soup = BeautifulSoup(reqs.text, 'html.parser')
# use multiple class selector list
sidebar = soup.find('ul', {"class": ["nav", "nav-list"]})
# find all the list tags within the ol
li = sidebar.find_all('li')
for item in li:
# iterate through results to find a link
link = item.find('a', href=True)
# if there is a link print it
if link is not None:
print(link['href'])
这可以通过直接转到链接来完成:
from bs4 import BeautifulSoup
import requests
url = 'http://books.toscrape.com/index.html'
reqs = requests.get(url)
if reqs.ok:
soup = BeautifulSoup(reqs.text, 'html.parser')
ul = soup.find_all('a')
for cells in ul:
print(cells['href'])