如何使用 python 在 html 代码中定位 td class？

Question

我的 html 代码中有一个 class。我需要使用 python 定位 td class "Currentlocation"。

代码：

<td class="CURRENTLOCATION"><img align="MIDDLE" src="..\Images\FolderOpen.bmp"/> Metrics</td>

下面是我试过的代码。

第一次尝试：

My_result = page_soup.find_element_by_class_name('CURRENTLOCATION')

出现 "TypeError: 'NoneType' object is not callable" 错误。第二次尝试：

My_result = page_soup.find(‘td’, attrs={‘class’: ‘CURRENTLOCATION’})

出现 "invalid character in identifier" 错误。

任何人都可以使用 python 帮助我在 html 代码中找到 class 吗？

Answer 1

BeautifulSoup 中有一个函数可以做到这一点。您可以获得所有需要的标签并在 find_all 函数中指定您要查找的属性。它 returns 满足条件的所有元素的列表

import re
from bs4 import BeautifulSoup 
text = '<td class="CURRENTLOCATION"><img align="MIDDLE" src="..\Images\FolderOpen.bmp"/> Metrics</td>'
soup = BeautifulSoup(text, 'lxml')
output_list = soup.find_all('td',{"class": "CURRENTLOCATION"}) # I am looking for all the td tags whose class atrribute is set to CURRENTLOCATION

Answer 2

from bs4 import BeautifulSoup
sdata = '<td class="CURRENTLOCATION"><img align="MIDDLE" src="..\Images\FolderOpen.bmp"/> Metrics</td>'
soup = BeautifulSoup(sdata, 'lxml')
mytds = soup.findAll("td", {"class": "CURRENTLOCATION"})
for td in mytds: 
    print(td)

Answer 3

我试过你的代码，第二个例子，问题是你使用的引号。对我来说，它们是撇号（'，unicode 代码点 \u2019），而 python 解释器需要单引号 (') 或双引号 (")。

更改它们我可以找到标签：

>>> bs.find('td', attrs={'class': 'CURRENTLOCATION'})
<td class="CURRENTLOCATION"><img align="MIDDLE" src="..\Images\FolderOpen.bmp"/> Metrics</td>

关于你的第一个例子。我不知道你在哪里找到对方法 find_element_by_class_name 的引用，但它似乎没有被 BeautifulSoup class 实现。 class 而是实现了 __getattr__ 方法，这是一种特殊的方法，每当您尝试访问不存在的属性时都会调用该方法。以下是该方法的摘录：

def __getattr__(self, tag):
    #print "Getattr %s.%s" % (self.__class__, tag)
    if len(tag) > 3 and tag.endswith('Tag'):
        #
    # We special case contents to avoid recursion.
    elif not tag.startswith("__") and not tag == "contents":
        return self.find(tag)

当您尝试访问属性 find_element_by_class_name 时，您实际上是在寻找具有相同名称的标签。

如何使用 python 在 html 代码中定位 td class？

How to locate td class in html code using python?

html

python

web-scraping

data-extraction

python-3.x