如何使用 XPath Selenium 和 Python 从 <p> 标签获取文本
How get the text from the <p> tag using XPath Selenium and Python
我需要用 XPath 从 <p>
中的文本中捕获一行。我需要将文本 Content-type: text/plain; charset=us-ascii
存储到 python 中的变量中,但出现下一个错误:
selenium.common.exceptions.WebDriverException: Message: TypeError: Expected an element or WindowProxy, got: [object Text] {}
这是我正在尝试的代码:
import selenium.webdriver as webdriver
browser = webdriver.Firefox()
browser.get('https://www.w3.org/Protocols/rfc1341/7_1_Text.html')
foo = browser.find_element_by_xpath('/html/body/p[5]/text()')
print(foo)
<h1>7.1 The Text Content-Type</h1>
<p>
The text Content-Type is intended for sending material which
is principally textual in form. It is the default Content-
Type. A "charset" parameter may be used to indicate the
character set of the body text. The primary subtype of text
is "plain". This indicates plain (unformatted) text. The
default Content-Type for Internet mail is "text/plain;
charset=us-ascii".
<p>
Beyond plain text, there are many formats for representing
what might be known as "extended text" -- text with embedded
formatting and presentation information. An interesting
characteristic of many such representations is that they are
to some extent readable even without the software that
interprets them. It is useful, then, to distinguish them,
at the highest level, from such unreadable data as images,
audio, or text represented in an unreadable form. In the
absence of appropriate interpretation software, it is
reasonable to show subtypes of text to the user, while it is
not reasonable to do so with most nontextual data.
<p>
Such formatted textual data should be represented using
subtypes of text. Plausible subtypes of text are typically
given by the common name of the representation format, e.g.,
"text/richtext".
<p>
<h3>7.1.1 The charset parameter</h3>
<p>
A critical parameter that may be specified in the Content-
Type field for text data is the character set. This is
specified with a "charset" parameter, as in:
<p>
Content-type: text/plain; charset=us-ascii
<p>
Unlike some other parameter values, the values of the
charset parameter are NOT case sensitive. The default
character set, which must be assumed in the absence of a
charset parameter, is US-ASCII.
xpath 中的 text()
是这里的问题,见下文:
import selenium.webdriver as webdriver
browser = webdriver.Firefox()
browser.get('https://www.w3.org/Protocols/rfc1341/7_1_Text.html')
foo = browser.find_element_by_xpath('/html/body/p[5]')
print(foo.text)
打印文本内容类型:text/plain; charset=us-ascii 你必须诱导 for the visibility_of_element_located()
and you can use either of the following :
使用 XPATH
和 text 属性:
driver.get("https://www.w3.org/Protocols/rfc1341/7_1_Text.html")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[contains(., 'The charset parameter')]//following-sibling::p[2]"))).text)
使用 XPATH
和 get_attribute()
:
driver.get("https://www.w3.org/Protocols/rfc1341/7_1_Text.html")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[contains(., 'The charset parameter')]//following-sibling::p[2]"))).get_attribute("innerHTML"))
控制台输出:
Content-type: text/plain; charset=us-ascii
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
我需要用 XPath 从 <p>
中的文本中捕获一行。我需要将文本 Content-type: text/plain; charset=us-ascii
存储到 python 中的变量中,但出现下一个错误:
selenium.common.exceptions.WebDriverException: Message: TypeError: Expected an element or WindowProxy, got: [object Text] {}
这是我正在尝试的代码:
import selenium.webdriver as webdriver
browser = webdriver.Firefox()
browser.get('https://www.w3.org/Protocols/rfc1341/7_1_Text.html')
foo = browser.find_element_by_xpath('/html/body/p[5]/text()')
print(foo)
<h1>7.1 The Text Content-Type</h1>
<p>
The text Content-Type is intended for sending material which
is principally textual in form. It is the default Content-
Type. A "charset" parameter may be used to indicate the
character set of the body text. The primary subtype of text
is "plain". This indicates plain (unformatted) text. The
default Content-Type for Internet mail is "text/plain;
charset=us-ascii".
<p>
Beyond plain text, there are many formats for representing
what might be known as "extended text" -- text with embedded
formatting and presentation information. An interesting
characteristic of many such representations is that they are
to some extent readable even without the software that
interprets them. It is useful, then, to distinguish them,
at the highest level, from such unreadable data as images,
audio, or text represented in an unreadable form. In the
absence of appropriate interpretation software, it is
reasonable to show subtypes of text to the user, while it is
not reasonable to do so with most nontextual data.
<p>
Such formatted textual data should be represented using
subtypes of text. Plausible subtypes of text are typically
given by the common name of the representation format, e.g.,
"text/richtext".
<p>
<h3>7.1.1 The charset parameter</h3>
<p>
A critical parameter that may be specified in the Content-
Type field for text data is the character set. This is
specified with a "charset" parameter, as in:
<p>
Content-type: text/plain; charset=us-ascii
<p>
Unlike some other parameter values, the values of the
charset parameter are NOT case sensitive. The default
character set, which must be assumed in the absence of a
charset parameter, is US-ASCII.
xpath 中的 text()
是这里的问题,见下文:
import selenium.webdriver as webdriver
browser = webdriver.Firefox()
browser.get('https://www.w3.org/Protocols/rfc1341/7_1_Text.html')
foo = browser.find_element_by_xpath('/html/body/p[5]')
print(foo.text)
打印文本内容类型:text/plain; charset=us-ascii 你必须诱导 visibility_of_element_located()
and you can use either of the following
使用
XPATH
和 text 属性:driver.get("https://www.w3.org/Protocols/rfc1341/7_1_Text.html") print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[contains(., 'The charset parameter')]//following-sibling::p[2]"))).text)
使用
XPATH
和get_attribute()
:driver.get("https://www.w3.org/Protocols/rfc1341/7_1_Text.html") print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[contains(., 'The charset parameter')]//following-sibling::p[2]"))).get_attribute("innerHTML"))
控制台输出:
Content-type: text/plain; charset=us-ascii
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC