BeautifulSoup - xml - find_next:限制为一个属性
BeautifulSoup - xml - find_next: limiting to one attribute
感谢任何帮助!使用下面的示例 XML 文件,我得到了不正确的输出。
Incorrect output:
Emp_F_Name: Jill
Emp_M_Name: H
Emp_L_Name: Jones
Desired output:
Emp_F_Name: Jill
Emp_M_Name: None or NULL
Emp_L_Name: Jones
我不确定为什么 find_next 函数会超出声明的属性(员工)。
<?xml version="1.0" encoding="utf-8"?>
<org value="Tech">
<employee>
<name>
<family>Jones</family>
<given>Jill</given>
</name>
</employee>
<manager>
<name>
<family>Fisher</family>
<given>Junior</given>
<given>H</given>
</name>
</manager>
</org>
这是我正在使用的代码。
employee = soup.find("employee")
for i in employee.find_all('name'):
fname = employee.find('given')
print("Emp_F_Name: ", fname.get_text())
mname = fname.find_next('given')
print("Emp_M_Name: ", mname.get_text())
lname = employee.find('family')
print("Emp_L_Name: ", lname.get_text())
当我 运行 相同的代码但对于经理来说,它似乎有效。
manager = soup.find("manager")
如果结构几乎相同,可以尝试'find_all()'所有given
的元素,检查是否只有一两个。
given= i.find_all('given')
fname = given[0]
print("Emp_F_Name: ", fname.get_text())
mname = given[1].get_text() if len(given) > 1 else None
print("Emp_M_Name: ", mname)
认为没有必要迭代 employee
但如果是这样,您应该使用 i
例子
import requests
from bs4 import BeautifulSoup
xml='''<?xml version="1.0" encoding="utf-8"?>
<org value="Tech">
<employee>
<name>
<family>Jones</family>
<given>Jill</given>
</name>
</employee>
<manager>
<name>
<family>Fisher</family>
<given>Junior</given>
<given>H</given>
</name>
</manager>
</org>'''
soup = BeautifulSoup(xml, 'lxml')
employee = soup.find("employee")
for i in employee.find_all('name'):
given= i.find_all('given')
fname = given[0]
print("Emp_F_Name: ", fname.get_text())
mname = given[1].get_text() if len(given) > 1 else None
print("Emp_M_Name: ", mname)
lname = i.find('family')
print("Emp_L_Name: ", lname.get_text())
输出
Emp_F_Name: Jill
Emp_M_Name: None
Emp_L_Name: Jones
另类
将 employee
隔离为单独的树以与 find_next()
一起操作:
employee = BeautifulSoup(str(soup.find("employee")), 'lxml')
for i in employee.find_all('name'):
fname = i.find('given')
print("Emp_F_Name: ", fname.get_text())
mname = fname.find_next('given').get_text() if fname.find_next('given') else None
print("Emp_M_Name: ", mname)
lname = i.find('family')
print("Emp_L_Name: ", lname.get_text())
使用XML解析器:(不需要任何外部库)
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<org value="Tech">
<employee>
<name>
<family>Jones</family>
<given>Jill</given>
</name>
</employee>
<manager>
<name>
<family>Fisher</family>
<given>Junior</given>
<given>H</given>
</name>
</manager>
</org>'''
attrs = {'Emp_F_Name':'given',
'Emp_L_Name':'family',
'Emp_M_Name': None}
root = ET.fromstring(xml)
name = root.find('.//name')
for k,v in attrs.items():
print(f'{k}: {name.find(v).text if v else None}')
输出
Emp_F_Name: Jill
Emp_L_Name: Jones
Emp_M_Name: None
感谢任何帮助!使用下面的示例 XML 文件,我得到了不正确的输出。
Incorrect output:
Emp_F_Name: Jill
Emp_M_Name: H
Emp_L_Name: Jones
Desired output:
Emp_F_Name: Jill
Emp_M_Name: None or NULL
Emp_L_Name: Jones
我不确定为什么 find_next 函数会超出声明的属性(员工)。
<?xml version="1.0" encoding="utf-8"?>
<org value="Tech">
<employee>
<name>
<family>Jones</family>
<given>Jill</given>
</name>
</employee>
<manager>
<name>
<family>Fisher</family>
<given>Junior</given>
<given>H</given>
</name>
</manager>
</org>
这是我正在使用的代码。
employee = soup.find("employee")
for i in employee.find_all('name'):
fname = employee.find('given')
print("Emp_F_Name: ", fname.get_text())
mname = fname.find_next('given')
print("Emp_M_Name: ", mname.get_text())
lname = employee.find('family')
print("Emp_L_Name: ", lname.get_text())
当我 运行 相同的代码但对于经理来说,它似乎有效。
manager = soup.find("manager")
如果结构几乎相同,可以尝试'find_all()'所有given
的元素,检查是否只有一两个。
given= i.find_all('given')
fname = given[0]
print("Emp_F_Name: ", fname.get_text())
mname = given[1].get_text() if len(given) > 1 else None
print("Emp_M_Name: ", mname)
认为没有必要迭代 employee
但如果是这样,您应该使用 i
例子
import requests
from bs4 import BeautifulSoup
xml='''<?xml version="1.0" encoding="utf-8"?>
<org value="Tech">
<employee>
<name>
<family>Jones</family>
<given>Jill</given>
</name>
</employee>
<manager>
<name>
<family>Fisher</family>
<given>Junior</given>
<given>H</given>
</name>
</manager>
</org>'''
soup = BeautifulSoup(xml, 'lxml')
employee = soup.find("employee")
for i in employee.find_all('name'):
given= i.find_all('given')
fname = given[0]
print("Emp_F_Name: ", fname.get_text())
mname = given[1].get_text() if len(given) > 1 else None
print("Emp_M_Name: ", mname)
lname = i.find('family')
print("Emp_L_Name: ", lname.get_text())
输出
Emp_F_Name: Jill
Emp_M_Name: None
Emp_L_Name: Jones
另类
将 employee
隔离为单独的树以与 find_next()
一起操作:
employee = BeautifulSoup(str(soup.find("employee")), 'lxml')
for i in employee.find_all('name'):
fname = i.find('given')
print("Emp_F_Name: ", fname.get_text())
mname = fname.find_next('given').get_text() if fname.find_next('given') else None
print("Emp_M_Name: ", mname)
lname = i.find('family')
print("Emp_L_Name: ", lname.get_text())
使用XML解析器:(不需要任何外部库)
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<org value="Tech">
<employee>
<name>
<family>Jones</family>
<given>Jill</given>
</name>
</employee>
<manager>
<name>
<family>Fisher</family>
<given>Junior</given>
<given>H</given>
</name>
</manager>
</org>'''
attrs = {'Emp_F_Name':'given',
'Emp_L_Name':'family',
'Emp_M_Name': None}
root = ET.fromstring(xml)
name = root.find('.//name')
for k,v in attrs.items():
print(f'{k}: {name.find(v).text if v else None}')
输出
Emp_F_Name: Jill
Emp_L_Name: Jones
Emp_M_Name: None