无法在 python 中使用 XPATH 获取文本值

Question

我正在尝试解析来自 this bank website 的货币。在代码中：

import requests
import time
import logging
from retrying import retry
from lxml import html

logging.basicConfig(filename='info.log', format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

@retry(wait_fixed=5000)
def fetch_data_from_nb_ved_ru():
try:
    page = requests.get('http://www.nbu.com/exchange_rates')
    #print page.text
    tree = (html.fromstring(page.text))
    #fetched_ved_usd_buy = tree.xpath('//div[@class="exchangeRates"]/table/tbody/tr[5]/td[5]')
    fetched_ved_usd_buy = tree.xpath('/html/body/div[1]/div//div[7]/div/div/div[1]//text()')
    print fetched_ved_usd_buy
    fetched_ved_usd_sell = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[6]/td[6]/text()')).strip()
    fetched_ved_eur_buy = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[7]/td[5]/text()')).strip()
    fetched_ved_eur_sell = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[7]/td[6]/text()')).strip()
    fetched_cb_eur = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[7]/td[4]/text()')).strip()
    fetched_cb_rub = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[18]/td[4]/text()')).strip()
    fetched_cb_usd = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[6]/td[4]/text()')).strip()
except:
    logging.warning("NB VED UZ fetch failed")
    raise IOError("NB VED UZ  fetch failed")
return fetched_ved_usd_buy, fetched_ved_usd_sell, fetched_cb_usd, fetched_ved_eur_buy, fetched_ved_eur_sell,\
    fetched_cb_eur, fetched_cb_rub

while True:
    f = open('values_uzb.txt', 'w')
    ved_usd_buy, ved_usd_sell, cb_usd, ved_eur_buy, ed_eur_sell, cb_eur, cb_rub = fetch_data_from_nb_ved_ru()
               f.write(str(ved_usd_buy)+'\n'+str(ved_usd_sell)+'\n'+str(cb_usd)+'\n'+str(ved_eur_buy)+'\n'+str(ed_eur_sell)+'\n'
        + str(cb_eur)+'\n'+str(cb_rub))

    f.close()
    time.sleep(120)

但它总是 returns 空字符串，但是如果我这样做 print page.text，我可以看到值在他们的位置上。我从 firebug 得到了那个 xpath。 Chrome 给出相同的 xpath。试图构建自己的 xpath //div[@class="exchangeRates"]/table/tbody/tr[5]/td[5] 但它恰好对.

无效

有什么建议吗？谢谢。

Answer 1

试试这个 xpath:

tree.xpath('//div[@class="exchangeRates"]//tr[NUMBER OF TR]/td[5]/text()')

另一件事...我觉得如果你输入这段代码你会改进你的代码：

trs = tree.xpath('//div[@class="exchangeRates"]//tr')
    for tr in trs:
        currency_code = tr.xpath('./td[7]/text()').strip()

        if currency_code=='USD':
            usd_buy = tr.xpath('./td[5]/text()').strip()
            usd_sell = tr.xpath('./td[6]/text()').strip()
            usd_cb = tr.xpath('./td[4]/text()').strip()

并继续使用您需要的其他货币。

这是一个快速代码，如果您需要更多详细信息，请回复。

Answer 2

我不确定你到底在找什么，但这行得通：

tree.xpath("/html/body/div[1]/div[7]/div/div/div[1]//text()")

至于从 class exchangeRates 开始，我发现通过使用 tree.xpath("//div[@class='exchangeRates']/table")[0].getchildren() 没有 table 的 tbody 子级，即使浏览器说有。参见 this SO question for an explanation。从原始 xpath 中删除 tbody 确实有效。但是，您选择的 (td[5]) 是空的，因此返回 []。尝试

tree.xpath("//div[@class='exchangeRates']/table/tr[5]/td[4]//text()")
# ['706.65']

或

tree.xpath("//div[@class='exchangeRates']/table/tr[6]/td[5]//text()")
# ['2638.00']

Answer 3

我使用以下语句，它对我来说运行得很好。

ActualValue = driver.find_element_by_xpath("//div/div[2]/div").text

无法在 python 中使用 XPATH 获取文本值

Can't get text values using XPATH in python

html

python

xpath

lxml

request