Python，从 DOM 中提取相同的元素并制作一个列表

Question

这是一种在 Python 中制作脚本以检查 HTML 代码中的项目是否以“abc=”开头并列出它们的方法吗？例如，我有一个代码，其中包含网格中的元素列表：

<div data-componentid="fa-gamepad" class="x-component x-button x-has-icon mainTopMenuItem
widerIcon x-icon-align-left x-arrow-align-right x-layout-box-item x-layout-hbox-item" 
data-xid="237" data-exttouchaction="11" id="fa-gamepad" senchatest="mainMenu_game">
<div class="x-inner-el" id="ext-element-877">
<div class="x-body-el" id="ext-element-876" style="padding: 0px !important;">
<div class="x-icon-el x-font-icon x-fa fa-gamepad" id="ext-element-878">
</div>
<div class="x-text-el">
</div>
</div>
<div class="x-arrow-el x-font-icon">
</div>
</div>
<div class="x-badge-el">
</div>
<button class="x-button-el" type="button" id="ext-element-875" data-componentid="fa-gamepad">
</button>
</div> 
<div data-componentid="plus" class="x-component x-button x-has-icon mainTopMenuItem x-icon-align-left x-arrow-align-right x-has-menu x-layout-box-item x-layout-hbox-item" data-xid="238" data-exttouchaction="11" id="plus" senchatest="mainMenu_plus">
<div class="x-inner-el" id="ext-element-881">
<div class="x-body-el" id="ext-element-880" style="padding: 0px !important;">
<div class="x-icon-el x-font-icon x-fa fa-plus" id="ext-element-882"></div><div class="x-text-el">
</div>
</div>
<div class="x-arrow-el x-font-icon">
</div>
</div>
<div class="x-badge-el">
</div>
<button class="x-button-el" type="button" id="ext-element-879" data-componentid="plus">
</button>
</div>

在上面的代码中，我有两个以“senchatest=”开头的元素，现在我希望 Python 找到这些元素并列出它们，如下所示：

senchatest="mainMenu_game"
senchatest="mainMenu_plus"

在我的 HTML 代码中，我有超过 300 个这样的元素，我需要列出它们以进行测试。

Answer 1

我们可以使用 Beautiful Soup，它是一个 python 库，用于从 HTML 和 XML 文件中提取数据。

# Importing BeautifulSoup class from the bs4 module
from bs4 import BeautifulSoup
import re
  
# Opening the html file(test.html contains the code snippet shared in the question)
HTMLFile = open("test.html", "r")
  
# Reading the file
index = HTMLFile.read()
  
# Creating a BeautifulSoup object and specifying the parser
S = BeautifulSoup(index, 'lxml')

#list to hold the values
l=[]

#find all 'div' tags
tag_name = S.find_all('div')
for tag in tag_name:
    #search for 'senchatest' in tags within div 
    if 'senchatest' in str(tag):
        tag=str(tag)
        #split the tag at 'senchatest'
        x = tag.partition("senchatest=")[2]
        #extract the value after "senchatest="
        x = x.split("\"")[1]
        #append to list
        l.append(x)

#To list as them , as you have mentioned in your expected output
for i in l:
    print("senchatest=" +"\""+i+"\"")

输出为：

senchatest="mainMenu_game"
senchatest="mainMenu_plus"

Python，从 DOM 中提取相同的元素并制作一个列表

Python, extracting same elements from DOM and making a list

python

automation