使用 ElementTree 提取 <content:encoded>
Using ElementTree to extract <content:encoded>
我目前正在尝试找出如何使用 Python 中的 ElementTree 提取 和 之间的内容。下面附上的是我目前用来解决这个问题的 Python 代码。我目前无法提取内容。我想提取“我喜欢打篮球和吃东西”。谁能帮我看看我的代码有什么问题?
xml = '''<item>
<title>Defensive Moves</title>
<link>www.timmy256.wordpress.com</link>
<pubDate></pubDate>
<dc:creator><![CDATA[jross]]></dc:creator>
<guid isPermaLink="false"> www.timmy256.wordpress.com </guid>
<description></description>
<content:encoded><![CDATA[I love playing basketball and eating food.]]></content:encoded>
</item>'''
import xml.etree.ElementTree as ET
tree = ET.parse(xml)
root = tree.getroot()
data = root.iter("content:encoded").text
另一种方法。
from simplified_scrapy import SimplifiedDoc
xml = '''<item>
<title>Defensive Moves</title>
<link>www.timmy256.wordpress.com</link>
<pubDate></pubDate>
<dc:creator><![CDATA[jross]]></dc:creator>
<guid isPermaLink="false"> www.timmy256.wordpress.com </guid>
<description></description>
<content:encoded><![CDATA[I love playing basketball and eating food.]]></content:encoded>
</item>'''
doc = SimplifiedDoc(xml)
print(doc.select('item>content:encoded>html()')[9:-3])
结果:
I love playing basketball and eating food.
这里有更多例子:https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples
我目前正在尝试找出如何使用 Python 中的 ElementTree 提取
xml = '''<item>
<title>Defensive Moves</title>
<link>www.timmy256.wordpress.com</link>
<pubDate></pubDate>
<dc:creator><![CDATA[jross]]></dc:creator>
<guid isPermaLink="false"> www.timmy256.wordpress.com </guid>
<description></description>
<content:encoded><![CDATA[I love playing basketball and eating food.]]></content:encoded>
</item>'''
import xml.etree.ElementTree as ET
tree = ET.parse(xml)
root = tree.getroot()
data = root.iter("content:encoded").text
另一种方法。
from simplified_scrapy import SimplifiedDoc
xml = '''<item>
<title>Defensive Moves</title>
<link>www.timmy256.wordpress.com</link>
<pubDate></pubDate>
<dc:creator><![CDATA[jross]]></dc:creator>
<guid isPermaLink="false"> www.timmy256.wordpress.com </guid>
<description></description>
<content:encoded><![CDATA[I love playing basketball and eating food.]]></content:encoded>
</item>'''
doc = SimplifiedDoc(xml)
print(doc.select('item>content:encoded>html()')[9:-3])
结果:
I love playing basketball and eating food.
这里有更多例子:https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples