XPath 获取空列表
XPath getting empty list
我正在尝试从这个网站 https://www.banxico.org.mx/ 获取这个数字(红色圆圈):
我有这个代码来获取它,但我得到一个空列表:
linktc='https://www.banxico.org.mx/'
pagetc=requests.get(linktc)
tree=html.fromstring(pagetc.content)
tipocambio=tree.xpath('//div[@id="vFIX"]//span[@class="valor"]//text()')
print("TC: ",tipocambio)
有人知道问题出在哪里吗?
这里的问题是,您需要一个 javascript 有能力的库。你想要的值是用JS生成的。
const puppeteer = require('puppeteer');
const fs = require('fs');
const debug = true;
(async () => {
const browser = await puppeteer.launch({
headless: true,
});
const page = await browser.newPage();
// UA
await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0')
// open main URL
await page.goto('https://www.banxico.org.mx/', { waitUntil: 'networkidle2' });
// wait for wanted selector to pop up
await page.waitForXPath('//div[@id="vFIX"]//span[@class="valor"]');
// retrieve text content
var element = await page.$x('//div[@id="vFIX"]//span[@class="valor"]/text()');
let text = await page.evaluate(element => element.textContent, element[0]);
console.log(text);
await browser.close();
})();
输出
22.6662
或者也勾选Web-scraping JavaScript page with Python
需要 Javascript 才能显示该值。您可以使用 Selenium 来获取它。或者直接从后台加载的JSON
中获取数据:
import urllib.request, json
with urllib.request.urlopen("https://www.banxico.org.mx/canales/singleFix.json") as url:
data = json.loads(url.read().decode())
print(data['valor'])
输出:22.6662
备选方案:从别处获取值。
from lxml import html
import requests
url = 'https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es'
r = requests.get(url)
tree = html.fromstring(r.content)
value=tree.xpath('//tr[@id="nodo_0_0_0"]/td[7]//td[last()]')[0].text
print(value.strip())
输出:22.6662
我正在尝试从这个网站 https://www.banxico.org.mx/ 获取这个数字(红色圆圈):
我有这个代码来获取它,但我得到一个空列表:
linktc='https://www.banxico.org.mx/'
pagetc=requests.get(linktc)
tree=html.fromstring(pagetc.content)
tipocambio=tree.xpath('//div[@id="vFIX"]//span[@class="valor"]//text()')
print("TC: ",tipocambio)
有人知道问题出在哪里吗?
这里的问题是,您需要一个 javascript 有能力的库。你想要的值是用JS生成的。
const puppeteer = require('puppeteer');
const fs = require('fs');
const debug = true;
(async () => {
const browser = await puppeteer.launch({
headless: true,
});
const page = await browser.newPage();
// UA
await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0')
// open main URL
await page.goto('https://www.banxico.org.mx/', { waitUntil: 'networkidle2' });
// wait for wanted selector to pop up
await page.waitForXPath('//div[@id="vFIX"]//span[@class="valor"]');
// retrieve text content
var element = await page.$x('//div[@id="vFIX"]//span[@class="valor"]/text()');
let text = await page.evaluate(element => element.textContent, element[0]);
console.log(text);
await browser.close();
})();
输出
22.6662
或者也勾选Web-scraping JavaScript page with Python
Javascript 才能显示该值。您可以使用 Selenium 来获取它。或者直接从后台加载的JSON
中获取数据:
import urllib.request, json
with urllib.request.urlopen("https://www.banxico.org.mx/canales/singleFix.json") as url:
data = json.loads(url.read().decode())
print(data['valor'])
输出:22.6662
备选方案:从别处获取值。
from lxml import html
import requests
url = 'https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es'
r = requests.get(url)
tree = html.fromstring(r.content)
value=tree.xpath('//tr[@id="nodo_0_0_0"]/td[7]//td[last()]')[0].text
print(value.strip())
输出:22.6662