如何使用 python beautifulsoup 识别 <p> 标签是否在 <table> 标签内?

How to identify if a <p> tag is inside a <table> tag using python beautifulsoup?

如何查找 table 标签内部和外部的 p 标签?

<p> word outside p tag inside table tag </p>

<table>
  <tbody>
    <tr>
      <td>
        <p> word inside p tag inside table tag </p>
     </td>
   </tr>
 </tbody>
</table>

您可以使用 CSS 选择器:

from bs4 import BeautifulSoup

html_doc = """
<p> word outside p tag inside table tag </p>

<table>
  <tbody>
    <tr>
      <td>
        <p> word inside p tag inside table tag </p>
     </td>
   </tr>
 </tbody>
</table>
"""

soup = BeautifulSoup(html_doc, "html.parser")

for p in soup.select("table p"):
    print(p.text)

打印:

 word inside p tag inside table tag 

或使用 bs4 API:

for table in soup.find_all("table"):
    for p in table.find_all("p"):
        print(p.text)