如何 select 应用商店中所有应用的链接并提取它的 href?
How to select all links of apps from app store and extract its href?
from bs4 import BeautifulSoup
import requests
from urllib.request import urlopen
url = f'https://www.apple.com/kr/search/youtube?src=globalnav'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
links = soup.select(".rf-serp-productname-list")
print(links)
我想抓取显示的应用程序的所有链接。当我搜索关键字时,我认为 links = soup.select(".rf-serp-productname-list")
可以,但链接列表是空的。
我该怎么办?
只需检查这段代码,我想就是你想要的:
import re
import requests
from bs4 import BeautifulSoup
pages = set()
def get_links(page_url):
global pages
pattern = re.compile("^(/)")
html = requests.get(f"your_URL{page_url}").text # fstrings require Python 3.6+
soup = BeautifulSoup(html, "html.parser")
for link in soup.find_all("a", href=pattern):
if "href" in link.attrs:
if link.attrs["href"] not in pages:
new_page = link.attrs["href"]
print(new_page)
pages.add(new_page)
get_links(new_page)
get_links("")
来源:
https://gist.github.com/AO8/f721b6736c8a4805e99e377e72d3edbf
您可以更改部分:
for link in soup.find_all("a", href=pattern):
#do something
检查我认为的关键字
您正在烹饪 soup
所以首先要尝一尝,看看里面是否包含您想要的一切。
ResultSet
您的 selection 是空的,因为响应中的结构与开发者工具中预期的结构略有不同。
要获取链接列表 select 更具体:
links = [a.get('href') for a in soup.select('a.icon')]
输出:
['https://apps.apple.com/kr/app/youtube/id544007664', 'https://apps.apple.com/kr/app/%EC%BF%A0%ED%8C%A1%ED%94%8C%EB%A0%88%EC%9D%B4/id1536885649', 'https://apps.apple.com/kr/app/youtube-music/id1017492454', 'https://apps.apple.com/kr/app/instagram/id389801252', 'https://apps.apple.com/kr/app/youtube-kids/id936971630', 'https://apps.apple.com/kr/app/youtube-studio/id888530356', 'https://apps.apple.com/kr/app/google-chrome/id535886823', 'https://apps.apple.com/kr/app/tiktok-%ED%8B%B1%ED%86%A1/id1235601864', 'https://apps.apple.com/kr/app/google/id284815942']
from bs4 import BeautifulSoup
import requests
from urllib.request import urlopen
url = f'https://www.apple.com/kr/search/youtube?src=globalnav'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
links = soup.select(".rf-serp-productname-list")
print(links)
我想抓取显示的应用程序的所有链接。当我搜索关键字时,我认为 links = soup.select(".rf-serp-productname-list")
可以,但链接列表是空的。
我该怎么办?
只需检查这段代码,我想就是你想要的:
import re
import requests
from bs4 import BeautifulSoup
pages = set()
def get_links(page_url):
global pages
pattern = re.compile("^(/)")
html = requests.get(f"your_URL{page_url}").text # fstrings require Python 3.6+
soup = BeautifulSoup(html, "html.parser")
for link in soup.find_all("a", href=pattern):
if "href" in link.attrs:
if link.attrs["href"] not in pages:
new_page = link.attrs["href"]
print(new_page)
pages.add(new_page)
get_links(new_page)
get_links("")
来源: https://gist.github.com/AO8/f721b6736c8a4805e99e377e72d3edbf
您可以更改部分:
for link in soup.find_all("a", href=pattern):
#do something
检查我认为的关键字
您正在烹饪 soup
所以首先要尝一尝,看看里面是否包含您想要的一切。
ResultSet
您的 selection 是空的,因为响应中的结构与开发者工具中预期的结构略有不同。
要获取链接列表 select 更具体:
links = [a.get('href') for a in soup.select('a.icon')]
输出:
['https://apps.apple.com/kr/app/youtube/id544007664', 'https://apps.apple.com/kr/app/%EC%BF%A0%ED%8C%A1%ED%94%8C%EB%A0%88%EC%9D%B4/id1536885649', 'https://apps.apple.com/kr/app/youtube-music/id1017492454', 'https://apps.apple.com/kr/app/instagram/id389801252', 'https://apps.apple.com/kr/app/youtube-kids/id936971630', 'https://apps.apple.com/kr/app/youtube-studio/id888530356', 'https://apps.apple.com/kr/app/google-chrome/id535886823', 'https://apps.apple.com/kr/app/tiktok-%ED%8B%B1%ED%86%A1/id1235601864', 'https://apps.apple.com/kr/app/google/id284815942']