使用 python Xpath 删除换行符和空格解析 XML
remove newline and whitespace parse XML with python Xpath
这是 xml 文件 http://www.diveintopython3.net/examples/feed.xml
我的代码是
我的结果是
我的问题是
如何去除文中的\n
及后面的白色space
如何获取文本为"dive into mark"的节点,如何查找文本语法
只需在每个节点上调用 normalize-space(.)
。
import lxml.etree as et
xml = et.parse("feed.xml")
ns = {"ns": 'http://www.w3.org/2005/Atom'}
for n in xml.xpath("//ns:category", namespaces=ns):
t = n.xpath("./../ns:summary", namespaces=ns)[0]
print(t.xpath("normalize-space(.)"))
输出:
Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
The accessibility orthodoxy does not permit people to question the value of features that are rarely useful and rarely used.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
您的所有换行符都已删除,多个 space 替换为单个 space。
你的问题的第二部分要求 title 标签,因为这是唯一包含你要查找的文本的标签,但要专门找到包含该确切文本的标题,就是:
xml.xpath("//ns:title[text()='dive into mark']", namespaces=ns)
如果您想要任何包含文本的节点,只需将 ns:title 替换为通配符:
xml.xpath("//*[text()='dive into mark']", namespaces=ns)
这是 xml 文件 http://www.diveintopython3.net/examples/feed.xml
我的代码是
我的结果是
我的问题是
如何去除文中的
\n
及后面的白色space如何获取文本为"dive into mark"的节点,如何查找文本语法
只需在每个节点上调用 normalize-space(.)
。
import lxml.etree as et
xml = et.parse("feed.xml")
ns = {"ns": 'http://www.w3.org/2005/Atom'}
for n in xml.xpath("//ns:category", namespaces=ns):
t = n.xpath("./../ns:summary", namespaces=ns)[0]
print(t.xpath("normalize-space(.)"))
输出:
Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
The accessibility orthodoxy does not permit people to question the value of features that are rarely useful and rarely used.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
您的所有换行符都已删除,多个 space 替换为单个 space。
你的问题的第二部分要求 title 标签,因为这是唯一包含你要查找的文本的标签,但要专门找到包含该确切文本的标题,就是:
xml.xpath("//ns:title[text()='dive into mark']", namespaces=ns)
如果您想要任何包含文本的节点,只需将 ns:title 替换为通配符:
xml.xpath("//*[text()='dive into mark']", namespaces=ns)