Python,从 DOM 中提取相同的元素并制作一个列表
Python, extracting same elements from DOM and making a list
这是一种在 Python 中制作脚本以检查 HTML 代码中的项目是否以“abc=”开头并列出它们的方法吗?
例如,我有一个代码,其中包含网格中的元素列表:
<div data-componentid="fa-gamepad" class="x-component x-button x-has-icon mainTopMenuItem
widerIcon x-icon-align-left x-arrow-align-right x-layout-box-item x-layout-hbox-item"
data-xid="237" data-exttouchaction="11" id="fa-gamepad" senchatest="mainMenu_game">
<div class="x-inner-el" id="ext-element-877">
<div class="x-body-el" id="ext-element-876" style="padding: 0px !important;">
<div class="x-icon-el x-font-icon x-fa fa-gamepad" id="ext-element-878">
</div>
<div class="x-text-el">
</div>
</div>
<div class="x-arrow-el x-font-icon">
</div>
</div>
<div class="x-badge-el">
</div>
<button class="x-button-el" type="button" id="ext-element-875" data-componentid="fa-gamepad">
</button>
</div>
<div data-componentid="plus" class="x-component x-button x-has-icon mainTopMenuItem x-icon-align-left x-arrow-align-right x-has-menu x-layout-box-item x-layout-hbox-item" data-xid="238" data-exttouchaction="11" id="plus" senchatest="mainMenu_plus">
<div class="x-inner-el" id="ext-element-881">
<div class="x-body-el" id="ext-element-880" style="padding: 0px !important;">
<div class="x-icon-el x-font-icon x-fa fa-plus" id="ext-element-882"></div><div class="x-text-el">
</div>
</div>
<div class="x-arrow-el x-font-icon">
</div>
</div>
<div class="x-badge-el">
</div>
<button class="x-button-el" type="button" id="ext-element-879" data-componentid="plus">
</button>
</div>
在上面的代码中,我有两个以“senchatest=”开头的元素,现在我希望 Python 找到这些元素并列出它们,如下所示:
senchatest="mainMenu_game"
senchatest="mainMenu_plus"
在我的 HTML 代码中,我有超过 300 个这样的元素,我需要列出它们以进行测试。
我们可以使用 Beautiful Soup,它是一个 python 库,用于从 HTML 和 XML 文件中提取数据。
# Importing BeautifulSoup class from the bs4 module
from bs4 import BeautifulSoup
import re
# Opening the html file(test.html contains the code snippet shared in the question)
HTMLFile = open("test.html", "r")
# Reading the file
index = HTMLFile.read()
# Creating a BeautifulSoup object and specifying the parser
S = BeautifulSoup(index, 'lxml')
#list to hold the values
l=[]
#find all 'div' tags
tag_name = S.find_all('div')
for tag in tag_name:
#search for 'senchatest' in tags within div
if 'senchatest' in str(tag):
tag=str(tag)
#split the tag at 'senchatest'
x = tag.partition("senchatest=")[2]
#extract the value after "senchatest="
x = x.split("\"")[1]
#append to list
l.append(x)
#To list as them , as you have mentioned in your expected output
for i in l:
print("senchatest=" +"\""+i+"\"")
输出为:
senchatest="mainMenu_game"
senchatest="mainMenu_plus"
这是一种在 Python 中制作脚本以检查 HTML 代码中的项目是否以“abc=”开头并列出它们的方法吗? 例如,我有一个代码,其中包含网格中的元素列表:
<div data-componentid="fa-gamepad" class="x-component x-button x-has-icon mainTopMenuItem
widerIcon x-icon-align-left x-arrow-align-right x-layout-box-item x-layout-hbox-item"
data-xid="237" data-exttouchaction="11" id="fa-gamepad" senchatest="mainMenu_game">
<div class="x-inner-el" id="ext-element-877">
<div class="x-body-el" id="ext-element-876" style="padding: 0px !important;">
<div class="x-icon-el x-font-icon x-fa fa-gamepad" id="ext-element-878">
</div>
<div class="x-text-el">
</div>
</div>
<div class="x-arrow-el x-font-icon">
</div>
</div>
<div class="x-badge-el">
</div>
<button class="x-button-el" type="button" id="ext-element-875" data-componentid="fa-gamepad">
</button>
</div>
<div data-componentid="plus" class="x-component x-button x-has-icon mainTopMenuItem x-icon-align-left x-arrow-align-right x-has-menu x-layout-box-item x-layout-hbox-item" data-xid="238" data-exttouchaction="11" id="plus" senchatest="mainMenu_plus">
<div class="x-inner-el" id="ext-element-881">
<div class="x-body-el" id="ext-element-880" style="padding: 0px !important;">
<div class="x-icon-el x-font-icon x-fa fa-plus" id="ext-element-882"></div><div class="x-text-el">
</div>
</div>
<div class="x-arrow-el x-font-icon">
</div>
</div>
<div class="x-badge-el">
</div>
<button class="x-button-el" type="button" id="ext-element-879" data-componentid="plus">
</button>
</div>
在上面的代码中,我有两个以“senchatest=”开头的元素,现在我希望 Python 找到这些元素并列出它们,如下所示:
senchatest="mainMenu_game"
senchatest="mainMenu_plus"
在我的 HTML 代码中,我有超过 300 个这样的元素,我需要列出它们以进行测试。
我们可以使用 Beautiful Soup,它是一个 python 库,用于从 HTML 和 XML 文件中提取数据。
# Importing BeautifulSoup class from the bs4 module
from bs4 import BeautifulSoup
import re
# Opening the html file(test.html contains the code snippet shared in the question)
HTMLFile = open("test.html", "r")
# Reading the file
index = HTMLFile.read()
# Creating a BeautifulSoup object and specifying the parser
S = BeautifulSoup(index, 'lxml')
#list to hold the values
l=[]
#find all 'div' tags
tag_name = S.find_all('div')
for tag in tag_name:
#search for 'senchatest' in tags within div
if 'senchatest' in str(tag):
tag=str(tag)
#split the tag at 'senchatest'
x = tag.partition("senchatest=")[2]
#extract the value after "senchatest="
x = x.split("\"")[1]
#append to list
l.append(x)
#To list as them , as you have mentioned in your expected output
for i in l:
print("senchatest=" +"\""+i+"\"")
输出为:
senchatest="mainMenu_game"
senchatest="mainMenu_plus"