使用 xpath returns 没有抓取网站

Question

我正在尝试从以下网站抓取工作职位：https://supersolid.com/careers。

相关数据是： [服务器开发人员，游戏 Designer/Senior 游戏设计师，营销美术师 (2D)，游戏设计师（新概念），高级服务器开发人员]。

我已经尝试了进入开发者工具的常规过程，看看网络中是否有一个 XHR 文件，我可以使用那里的所有角色。 dev tools / network

然后我尝试使用 XPath 抓取它

    data = []
    url = "https://supersolid.com/careers"
    page = requests.get(url)
    tree = html.fromstring(page.content)
    xpath = '/html/body/main/section[2]/div/div/div[5]/div/h4'
    jobs = tree.xpath(xpath)
    print(len(jobs))

我使用 print(len(jobs)) 并且它 returns 0

不太确定我还能做什么。

Answer 1

尝试BeautifulSoup。

from bs4 import BeautifulSoup
import requests

data = []
url = "https://supersolid.com/careers"
page = requests.get(url)

soup = BeautifulSoup(page.text, 'html.parser')
jobs = soup.find_all('h4')
print(len(jobs))

Answer 2

在 HTTP 请求中指定 User-Agent：

import requests
from lxml import html

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0"
}
url = "https://supersolid.com/careers"
page = requests.get(url, headers=headers)
tree = html.fromstring(page.content)
xpath = ".//h4"
jobs = tree.xpath(xpath)
print([j.text for j in jobs])

打印：

['Server Developer', 'Game Designer/Senior Game Designer', 'Marketing Artist (2D)', 'Game Designer (New Concepts)', 'Senior Server Developer']

使用 xpath returns 没有抓取网站

Scraping website with xpath returns nothing

xpath

screen-scraping

web-scraping

python-3.x