如何 select 应用商店中所有应用的链接并提取它的 href?

How to select all links of apps from app store and extract its href?

from bs4 import BeautifulSoup
import requests
from urllib.request import urlopen

url = f'https://www.apple.com/kr/search/youtube?src=globalnav'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
links = soup.select(".rf-serp-productname-list")
print(links)

我想抓取显示的应用程序的所有链接。当我搜索关键字时,我认为 links = soup.select(".rf-serp-productname-list") 可以,但链接列表是空的。

我该怎么办?

只需检查这段代码,我想就是你想要的:

import re
import requests
from bs4 import BeautifulSoup

pages = set()

def get_links(page_url):
  global pages
  pattern = re.compile("^(/)")
  html = requests.get(f"your_URL{page_url}").text # fstrings require Python 3.6+
  soup = BeautifulSoup(html, "html.parser")
  for link in soup.find_all("a", href=pattern):
    if "href" in link.attrs:
      if link.attrs["href"] not in pages:
        new_page = link.attrs["href"]
        print(new_page)
        pages.add(new_page)
        get_links(new_page)
        
get_links("")

来源: https://gist.github.com/AO8/f721b6736c8a4805e99e377e72d3edbf

您可以更改部分:

for link in soup.find_all("a", href=pattern):
     #do something

检查我认为的关键字

您正在烹饪 soup 所以首先要尝一尝,看看里面是否包含您想要的一切。

ResultSet 您的 selection 是空的,因为响应中的结构与开发者工具中预期的结构略有不同。

要获取链接列表 select 更具体:

links = [a.get('href') for a in soup.select('a.icon')]  

输出:

['https://apps.apple.com/kr/app/youtube/id544007664', 'https://apps.apple.com/kr/app/%EC%BF%A0%ED%8C%A1%ED%94%8C%EB%A0%88%EC%9D%B4/id1536885649', 'https://apps.apple.com/kr/app/youtube-music/id1017492454', 'https://apps.apple.com/kr/app/instagram/id389801252', 'https://apps.apple.com/kr/app/youtube-kids/id936971630', 'https://apps.apple.com/kr/app/youtube-studio/id888530356', 'https://apps.apple.com/kr/app/google-chrome/id535886823', 'https://apps.apple.com/kr/app/tiktok-%ED%8B%B1%ED%86%A1/id1235601864', 'https://apps.apple.com/kr/app/google/id284815942']