我需要在 python 中使用 lxml 获取元标记的值
I need to get the value of meta tag using lxml in python
Inspect of Website
我想提取 content="------" 中的 postalCode 值
import requests
import lxml.html
html = requests.get("https://www.craispesaonline.it/provincia/lucca")
doc = lxml.html.fromstring(html.content)
x = doc.xpath('//meta[@itemprop="postalCode"]/@content')
print(x.text_content())
如果你这样做 print("itemprop" in html.content)
,你会发现标签根本不在 HTML 来源中,这意味着它是由某些 JavaScript [=21] 添加的=] 在页面上。只有 LXML(或 BeautifulSoup 就此而言)不会执行 JavaScript。您需要一个无头浏览器引擎才能这样做。
另一方面,对于这个特定的网站,您不需要从源代码中抓取邮政编码,因为如果您在加载页面时查看浏览器检查器,您可以看到地址信息从 https://www.craispesaonline.it/showcase/rest/api/public/province/lucca 加载:
[
{
id: 256,
name: "CRAI Lucca",
alias: "lucca-via-prov-salessio-1609",
address: "Via di Sant'Alessio, 1609",
city: "Lucca",
zipCode: "55100",
servedZipCodes: [],
latitude: 43.8611631,
longitude: 10.487961799999994,
groceryCode: "005",
email: "tetsrllucca@gmail.com",
telephone: "0583/341251",
media: [
{ url: "694/694_1", altText: "Crai Lucca", title: "Crai Lucca" },
{ url: "694/694_2", altText: "Crai Lucca", title: "Crai Lucca" },
],
fullCity: { id: 4484, name: "Lucca", latitude: 43.84432282, longitude: 10.50151366 },
province: { id: 49, name: "Lucca", code: "LU", istatCode: "046", alias: null, region: "Toscana", regionIstat: "09", temp_alias: "lucca" },
shippingEnabled: true,
disabled: false,
indexable: true,
},
...
]
Inspect of Website
我想提取 content="------" 中的 postalCode 值
import requests
import lxml.html
html = requests.get("https://www.craispesaonline.it/provincia/lucca")
doc = lxml.html.fromstring(html.content)
x = doc.xpath('//meta[@itemprop="postalCode"]/@content')
print(x.text_content())
如果你这样做 print("itemprop" in html.content)
,你会发现标签根本不在 HTML 来源中,这意味着它是由某些 JavaScript [=21] 添加的=] 在页面上。只有 LXML(或 BeautifulSoup 就此而言)不会执行 JavaScript。您需要一个无头浏览器引擎才能这样做。
另一方面,对于这个特定的网站,您不需要从源代码中抓取邮政编码,因为如果您在加载页面时查看浏览器检查器,您可以看到地址信息从 https://www.craispesaonline.it/showcase/rest/api/public/province/lucca 加载:
[
{
id: 256,
name: "CRAI Lucca",
alias: "lucca-via-prov-salessio-1609",
address: "Via di Sant'Alessio, 1609",
city: "Lucca",
zipCode: "55100",
servedZipCodes: [],
latitude: 43.8611631,
longitude: 10.487961799999994,
groceryCode: "005",
email: "tetsrllucca@gmail.com",
telephone: "0583/341251",
media: [
{ url: "694/694_1", altText: "Crai Lucca", title: "Crai Lucca" },
{ url: "694/694_2", altText: "Crai Lucca", title: "Crai Lucca" },
],
fullCity: { id: 4484, name: "Lucca", latitude: 43.84432282, longitude: 10.50151366 },
province: { id: 49, name: "Lucca", code: "LU", istatCode: "046", alias: null, region: "Toscana", regionIstat: "09", temp_alias: "lucca" },
shippingEnabled: true,
disabled: false,
indexable: true,
},
...
]