如何从 html 中提取数字？

Question

我正在尝试从此 html 元素中提取数字：

<td bgcolor="green">
    <font color="white">
        "49.8 "
        <small>dBmV</small>
    </font>
</td>

如何只提取 49.8 而没有得到 bBmV？

我可以使用 xpath 到 return 所有 49.8 dbmv 但是当搜索只有“49.8”的 xpath 时我收到错误

错误：

invalid selector: The result of the xpath expression "/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font/text()" is: [object Text]. It should be an element.

我试过：

browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font").text

哪个return 49.8 dBmV

然后：

browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font/text()").text

return是上面的例外。

我只想要数字 49.8（变化明显）。我知道我可以稍后提取号码，但我希望有一些东西可以用来直接从 html 中获取详细信息，一些更整洁的东西

Answer 1

Selenium 中的 find_element_by_xpath API 仅支持 returning 元素，因此即使在 XPath 中可以指定一个表达式 return 只是您想要的文本在这种情况下，仅使用 XPath 是不可能的。

Answer 2

您可以使用第一行并像这样获取数字：

text_num = browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font").text
print(float(text_num.split()[0]))

希望对您有所帮助！

Answer 3

您可以 replace 这样的额外文本：

first_text = browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font").text
second_text = browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font/small").text
only_first_text = first_text.replace(second_text, '')

Answer 4

要提取文本 49.8，您可以使用以下 :

使用 xpath 通过 execute_script() 和 textContent:

print(driver.execute_script('return arguments[0].firstChild.textContent;', driver.find_element_by_xpath("//td[@bgcolor='green']/font[@color='white']")).strip())

使用 xpath 通过 splitlines() 和 get_attribute():

print(driver.find_element_by_xpath("//td[@bgcolor='green']/font[@color='white']").get_attribute("innerHTML").splitlines()[1])

如何从 html 中提取数字？

How to extract just the number from html?

python

selenium

xpath

xpath-1.0

selenium-webdriver