在 Python 中使用 xpath 和 lxml 获取数据时出现问题

Question

我正在尝试获取 BC_1YEAR 和 NEW_DATE 的内容以获取 this data 中的最后一个条目。

这是我的 Python Google App Engine 代码：

import lxml.etree
from google.appengine.api import urlfetch

def foo():
    url = 'http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData$filter=month(NEW_DATE)%20eq%201%20and%20year(NEW_DATE)%20eq%202015'
    response = urlfetch.fetch(url)
    tree = lxml.etree.fromstring(response.content)
    nsmap = {'atom': 'http://www.w3.org/2005/Atom',
             'd': 'http://schemas.microsoft.com/ado/2007/08/dataservices'}
    myData = tree.xpath("//atom:entry[last()]/d:BC_1YEAR", namespaces=nsmap)

但是 myData 是一个空列表，而它应该是今天数据的 0.2。我已经尝试了几个小时才能使它正常工作，因此将不胜感激任何帮助。我假设 NEW_DATE 的工作方式类似。

Answer 1

据我所知，正确的URL是

http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData?$filter=month%28NEW_DATE%29%20eq%201%20and%20year%28NEW_DATE%29%20eq%202015

而在您的代码中有（Data 之后没有 ?）

http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData$filter=month%28NEW_DATE%29%20eq%201%20and%20year%28NEW_DATE%29%20eq%202015

这就是为什么您的代码当前生成以下 XML:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<error xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
  <code></code>
  <message xml:lang="en-US">Resource not found for the segment 'DailyTreasuryYieldCurveRateData$filter=month'.</message>
</error>

当然那个错误消息中没有 atom:entry。

此外，您的 XPath 表达式：

//atom:entry[last()]/d:BC_1YEAR

不会检索 d:BC_1YEAR，因为 d:BC_1YEAR 不是 atom:entry 的直接子代。使用

//atom:entry[last()]//d:BC_1YEAR

或者，更好的是，在代码中注册 m: 前缀并使用

//atom:entry[last()]/atom:content/m:properties/d:BC_1YEAR

import lxml.etree
from google.appengine.api import urlfetch

def foo():
    url = 'http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData?$filter=month%28NEW_DATE%29%20eq%201%20and%20year%28NEW_DATE%29%20eq%202015'
    response = urlfetch.fetch(url)
    tree = lxml.etree.fromstring(response.content)
    nsmap = {'atom': 'http://www.w3.org/2005/Atom',
             'd': 'http://schemas.microsoft.com/ado/2007/08/dataservices',
             'm': 'http://schemas.microsoft.com/ado/2007/08/dataservices/metadata'}
    myData = tree.xpath("//atom:entry[last()]/atom:content/m:properties/d:BC_1YEAR", namespaces=nsmap)

编辑：作为对您评论的回应：

I want my code to work 'indefinitely' with as little maintenance as possible. I don't know what the purpose of namespaces really are and I wonder if these particular namespaces are generic and can be expected to stay that way for years to come?

我已经 explained the purpose of namespaces in XML elsewhere - 请也看看那个答案。命名空间从来都不是通用的，事实上，它们与通用正好相反——它们应该是 unique.

也就是说，有很多方法可以忽略名称空间。像

这样的表达式

//atom:entry[last()]//d:BC_1YEAR

可以重写为

//*[local-name() = 'entry'][last()]//*[local-name() = 'BC_1YEAR']

查找元素而不考虑其命名空间。如果您有理由相信命名空间 URI 将来会发生变化，这将是一个选项。

在 Python 中使用 xpath 和 lxml 获取数据时出现问题

Problems grabbing data with xpath and lxml in Python

python

xpath

lxml