XPath 获取空列表

XPath getting empty list

我正在尝试从这个网站 https://www.banxico.org.mx/ 获取这个数字(红色圆圈):

我有这个代码来获取它,但我得到一个空列表:

linktc='https://www.banxico.org.mx/'
pagetc=requests.get(linktc)
tree=html.fromstring(pagetc.content)
tipocambio=tree.xpath('//div[@id="vFIX"]//span[@class="valor"]//text()')
print("TC: ",tipocambio)

有人知道问题出在哪里吗?

这里的问题是,您需要一个 有能力的库。你想要的值是用JS生成的。

您可以改用 via :

const puppeteer = require('puppeteer');
const fs = require('fs');
const debug = true;

(async () => {
    const browser = await puppeteer.launch({
        headless: true,
    });

    const page = await browser.newPage();

    // UA
    await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0')

    // open main URL
    await page.goto('https://www.banxico.org.mx/', { waitUntil: 'networkidle2' });

    // wait for wanted selector to pop up
    await page.waitForXPath('//div[@id="vFIX"]//span[@class="valor"]');

    // retrieve text content
    var element = await page.$x('//div[@id="vFIX"]//span[@class="valor"]/text()');
    let text = await page.evaluate(element => element.textContent, element[0]);

    console.log(text);

    await browser.close();
})();

输出

22.6662

或者也勾选Web-scraping JavaScript page with Python

需要

Javascript 才能显示该值。您可以使用 Selenium 来获取它。或者直接从后台加载的JSON中获取数据:

import urllib.request, json 
with urllib.request.urlopen("https://www.banxico.org.mx/canales/singleFix.json") as url:
    data = json.loads(url.read().decode())
    print(data['valor'])

输出:22.6662

备选方案:从别处获取值。

from lxml import html
import requests

url = 'https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es'
r = requests.get(url)
tree = html.fromstring(r.content)
value=tree.xpath('//tr[@id="nodo_0_0_0"]/td[7]//td[last()]')[0].text
print(value.strip())

输出:22.6662