如何在 bs4 [ python 3 ] 中的另一个标签内没有 class 或 id 的情况下从标签中抓取 url
How can I scrape url from tag without class or id inside another tag in bs4 [ python 3 ]
我想从 ( h2 class="" > a href="" : )
中获取所有 url
此代码:
import requests
from bs4 import BeautifulSoup
header = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:77.0) Gecko/20190101 Firefox/77.0"}
Purl = 'https://www.tunisianet.com.tn/301-pc-portable-tunisie'
req = requests.get(Purl, headers=header)
soup = BeautifulSoup(req.content, 'lxml')
ProductUrl = []
#find title of product
showName = soup.select('h2',{'class':'h3 product-title'})
#find link of product
for i in showName:
ProductUrl.append(str(i.find('a')))
print(ProductUrl)
for i in ProductUrl:
print(i[i.find("href"):])
我该如何解决?
例如:
这是所需的输出:
代码
import requests
from bs4 import BeautifulSoup
header = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:77.0) Gecko/20190101 Firefox/77.0"}
Purl = 'https://www.tunisianet.com.tn/301-pc-portable-tunisie'
req = requests.get(Purl, headers=header)
soup = BeautifulSoup(req.content, 'lxml')
ProductUrl = []
#find title of product
showName = soup.select('h2.h3.product-title a')
#find link of product
for i in showName:
ProductUrl.append(i.get('href'))
#print(ProductUrl)
for i in ProductUrl:
print(i)
输出
https://www.tunisianet.com.tn/pc-portable-tunisie/48873-pc-portable-vegabook-plus-14-quad-core-4-go-silver-50-dt-bon-d-achat.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51363-pc-portable-lenovo-v15-iil-i5-10e-gen-4-go-82C500TAFE.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52830-pc-portable-lenovo-ideapad-3-15igl05-dual-core-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/50111-pc-portable-asus-x543ma-gq1012t-dual-core-4-go-gris-antivirus-bitdefender.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51234-pc-portable-asus-x543ma-dual-core-4-go-silver-antivirus-bitdefender.html
https://www.tunisianet.com.tn/pc-portable-tunisie/53434-pc-portable-lenovo-v15-igl-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/53435-pc-portable-lenovo-ideapad-3-15igl05-dual-core-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47112-pc-portable-hp-15-dw1001nk-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47114-pc-portable-hp-15-dw1000nk-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47115-pc-portable-hp-15-dw1000nk-dual-core-8-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47113-pc-portable-hp-15-dw1001nk-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51643-pc-portable-hp-15-dw1000nk-dual-core-16-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51644-pc-portable-hp-15-dw1001nk-dual-core-16-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/53033-pc-portable-asus-vivobook-e410ma-quad-core-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52815-pc-portable-lenovo-ideapad-3-15iil05-i3-10e-gen-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52905-pc-portable-asus-d415da-bv873t-amd-ryzen-3-3250u-4-go-windows-10-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52353-pc-portable-asus-vivobook-max-x543ua-i3-7e-gen-4-go-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52819-pc-portable-lenovo-ideapad-3-15iil05-i3-10e-gen-8-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51255-pc-portable-asus-m509da-amd-ryzen-3-4-go-silver-antivirus-bitdefender.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52354-pc-portable-asus-vivobook-max-x543ua-i3-7e-gen-8-go-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52820-pc-portable-lenovo-ideapad-3-15iil05-i3-10e-gen-12-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52355-pc-portable-asus-vivobook-max-x543ua-i3-7e-gen-12-go-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52447-pc-portable-asus-vivobook-x509fa-i3-10e-gen-4-go-silver.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52906-pc-portable-asus-vivobook-x409fa-i3-10e-gen-4-go-silver.html
我想从 ( h2 class="" > a href="" : )
中获取所有 url此代码:
import requests from bs4 import BeautifulSoup header = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:77.0) Gecko/20190101 Firefox/77.0"} Purl = 'https://www.tunisianet.com.tn/301-pc-portable-tunisie' req = requests.get(Purl, headers=header) soup = BeautifulSoup(req.content, 'lxml') ProductUrl = [] #find title of product showName = soup.select('h2',{'class':'h3 product-title'}) #find link of product for i in showName: ProductUrl.append(str(i.find('a'))) print(ProductUrl) for i in ProductUrl: print(i[i.find("href"):])
我该如何解决?
例如:
这是所需的输出:
代码
import requests
from bs4 import BeautifulSoup
header = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:77.0) Gecko/20190101 Firefox/77.0"}
Purl = 'https://www.tunisianet.com.tn/301-pc-portable-tunisie'
req = requests.get(Purl, headers=header)
soup = BeautifulSoup(req.content, 'lxml')
ProductUrl = []
#find title of product
showName = soup.select('h2.h3.product-title a')
#find link of product
for i in showName:
ProductUrl.append(i.get('href'))
#print(ProductUrl)
for i in ProductUrl:
print(i)
输出
https://www.tunisianet.com.tn/pc-portable-tunisie/48873-pc-portable-vegabook-plus-14-quad-core-4-go-silver-50-dt-bon-d-achat.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51363-pc-portable-lenovo-v15-iil-i5-10e-gen-4-go-82C500TAFE.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52830-pc-portable-lenovo-ideapad-3-15igl05-dual-core-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/50111-pc-portable-asus-x543ma-gq1012t-dual-core-4-go-gris-antivirus-bitdefender.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51234-pc-portable-asus-x543ma-dual-core-4-go-silver-antivirus-bitdefender.html
https://www.tunisianet.com.tn/pc-portable-tunisie/53434-pc-portable-lenovo-v15-igl-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/53435-pc-portable-lenovo-ideapad-3-15igl05-dual-core-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47112-pc-portable-hp-15-dw1001nk-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47114-pc-portable-hp-15-dw1000nk-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47115-pc-portable-hp-15-dw1000nk-dual-core-8-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47113-pc-portable-hp-15-dw1001nk-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51643-pc-portable-hp-15-dw1000nk-dual-core-16-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51644-pc-portable-hp-15-dw1001nk-dual-core-16-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/53033-pc-portable-asus-vivobook-e410ma-quad-core-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52815-pc-portable-lenovo-ideapad-3-15iil05-i3-10e-gen-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52905-pc-portable-asus-d415da-bv873t-amd-ryzen-3-3250u-4-go-windows-10-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52353-pc-portable-asus-vivobook-max-x543ua-i3-7e-gen-4-go-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52819-pc-portable-lenovo-ideapad-3-15iil05-i3-10e-gen-8-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51255-pc-portable-asus-m509da-amd-ryzen-3-4-go-silver-antivirus-bitdefender.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52354-pc-portable-asus-vivobook-max-x543ua-i3-7e-gen-8-go-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52820-pc-portable-lenovo-ideapad-3-15iil05-i3-10e-gen-12-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52355-pc-portable-asus-vivobook-max-x543ua-i3-7e-gen-12-go-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52447-pc-portable-asus-vivobook-x509fa-i3-10e-gen-4-go-silver.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52906-pc-portable-asus-vivobook-x409fa-i3-10e-gen-4-go-silver.html