如何从 lxml html 中的 select 节点?
How to select nodes in html from lxml?
我有一些来自 http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0 from my previous post 的 html 代码,现在想创建一个逻辑过程,因为许多其他页面相似,但并不完全相同。因此,
<div id="names">
<h2>Names and Synonyms</h2>
<div class="ds">
<button class="toggle1Col" title="Toggle display between 1 column of wider results and multiple columns.">↔</button>
<h3>Name of Substance</h3>
<ul>
<li id="ds2"><div>Acetaldehyde</div></li>
</ul>
<h3>MeSH Heading</h3>
<ul>
<li id="ds3"><div>Acetaldehyde</div></li>
</ul>
</div>
现在在我的 python 脚本中,我想 select 节点 "Name of Substance" 和 "MeSH Heading" 并检查它们是否存在,如果存在则 select 它们中的数据否则 return 一个空字符串。有没有办法像在 Javascript 中那样在 python 中使用 Node myNode = doc.DocumentNode.SelectNode(/[text()="Name Of Substance"/)?
from lxml import html
import requests
import csv
page = requests.get(http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0)
tree = html.fromstring(page.text)
if( Name of substance is there )
chem_name = tree.xpath('//*[text()="Name of Substance"]/..//div')[0].text_content()
else
chem_name = []
if ( MeSH Heading there )
mesh_name = tree.xpath('//*[text()="MeSH Heading"]/..//div')[1].text_content()
else
mesh_name = []
names1 = [chem_name, mesh_name]
with open('testchem.csv', 'wb') as myfile:
wr = csv.writer(myfile)
wr.writerow(names1)
您可以简单地检查 Name of Substance
或 MeSH Heading
是否在网页的文本中,如果它们是 select 的内容。
from lxml import html
import requests
import csv
page = requests.get('http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0')
tree = html.fromstring(page.text)
if ("Name of Substance" in page.text):
chem_name = tree.xpath('//*[text()="Name of Substance"]/..//div')[0].text_content()
else:
chem_name = ""
if ("MeSH Heading" in page.text):
mesh_name = tree.xpath('//*[text()="MeSH Heading"]/..//div')[1].text_content()
else:
mesh_name = ""
names1 = [chem_name, mesh_name]
with open('testchem.csv', 'wb') as myfile:
wr = csv.writer(myfile)
wr.writerow(names1)
我有一些来自 http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0 from my previous post
<div id="names">
<h2>Names and Synonyms</h2>
<div class="ds">
<button class="toggle1Col" title="Toggle display between 1 column of wider results and multiple columns.">↔</button>
<h3>Name of Substance</h3>
<ul>
<li id="ds2"><div>Acetaldehyde</div></li>
</ul>
<h3>MeSH Heading</h3>
<ul>
<li id="ds3"><div>Acetaldehyde</div></li>
</ul>
</div>
现在在我的 python 脚本中,我想 select 节点 "Name of Substance" 和 "MeSH Heading" 并检查它们是否存在,如果存在则 select 它们中的数据否则 return 一个空字符串。有没有办法像在 Javascript 中那样在 python 中使用 Node myNode = doc.DocumentNode.SelectNode(/[text()="Name Of Substance"/)?
from lxml import html
import requests
import csv
page = requests.get(http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0)
tree = html.fromstring(page.text)
if( Name of substance is there )
chem_name = tree.xpath('//*[text()="Name of Substance"]/..//div')[0].text_content()
else
chem_name = []
if ( MeSH Heading there )
mesh_name = tree.xpath('//*[text()="MeSH Heading"]/..//div')[1].text_content()
else
mesh_name = []
names1 = [chem_name, mesh_name]
with open('testchem.csv', 'wb') as myfile:
wr = csv.writer(myfile)
wr.writerow(names1)
您可以简单地检查 Name of Substance
或 MeSH Heading
是否在网页的文本中,如果它们是 select 的内容。
from lxml import html
import requests
import csv
page = requests.get('http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0')
tree = html.fromstring(page.text)
if ("Name of Substance" in page.text):
chem_name = tree.xpath('//*[text()="Name of Substance"]/..//div')[0].text_content()
else:
chem_name = ""
if ("MeSH Heading" in page.text):
mesh_name = tree.xpath('//*[text()="MeSH Heading"]/..//div')[1].text_content()
else:
mesh_name = ""
names1 = [chem_name, mesh_name]
with open('testchem.csv', 'wb') as myfile:
wr = csv.writer(myfile)
wr.writerow(names1)