如何解析 HTML 并使用 Python 获取 table id
How to parse HTML and get table ids using Python
我正在寻找解析 html 并使用 python 获取 table 个 ID 的列表。
我有一个 HTML 文档,格式如下,包含多个 table:
我正在尝试抓取并获取 table 个 ID 的页面 - https://docs.aws.amazon.com/workspaces/latest/adminguide/workspaces-port-requirements.html
<html>
<div class="table-container">
<div class="table-contents disable-scroll">
<table id="w345aab9c13c11b5"> # this is table id for below table name
<thead>
<tr>
<th class="table-header" colspan="100%">
<div class="title">Domains and IP addresses to add to your allow list</div> # I need to look for this table name and get the table id associated with it
</th>
</tr>
</thead>
<tbody>
...
</tbody>
</table>
</div>
</div>
<div class="table-container">
<div class="table-contents disable-scroll">
<table id="w345aab9c13c13b2">
<thead>
<tr>
<th class="table-header" colspan="100%">
<div class="title">Domains and IP Addresses to Add to Your Allow List for PCoIP</div>
</th>
</tr>
<tr>
...
</tr>
</thead>
<tbody>
...
</tbody>
</table>
</div>
</div>
...
</html>
我需要检查 div
标签中的匹配值并获取与之关联的 table id
我是 python 的新手,任何关于如何解决这个问题或解决方案的建议都会很有帮助。
您可以使用 BeautifulSoup 获取 ID:
import requests
from bs4 import BeautifulSoup
url = 'http://docs.aws.amazon.com/workspaces/latest/adminguide/workspaces-port-requirements.html'
resp = requests.get(url)
soup = BeautifulSoup(resp.content, 'html.parser')
for t in soup.select('table[id]'):
if 'Domains and IP Addresses to Add to Your Allow List' in t.getText():
print(t.attrs['id'])
我相信您能想出如何将其合并到您的代码中。
我正在寻找解析 html 并使用 python 获取 table 个 ID 的列表。
我有一个 HTML 文档,格式如下,包含多个 table:
我正在尝试抓取并获取 table 个 ID 的页面 - https://docs.aws.amazon.com/workspaces/latest/adminguide/workspaces-port-requirements.html
<html>
<div class="table-container">
<div class="table-contents disable-scroll">
<table id="w345aab9c13c11b5"> # this is table id for below table name
<thead>
<tr>
<th class="table-header" colspan="100%">
<div class="title">Domains and IP addresses to add to your allow list</div> # I need to look for this table name and get the table id associated with it
</th>
</tr>
</thead>
<tbody>
...
</tbody>
</table>
</div>
</div>
<div class="table-container">
<div class="table-contents disable-scroll">
<table id="w345aab9c13c13b2">
<thead>
<tr>
<th class="table-header" colspan="100%">
<div class="title">Domains and IP Addresses to Add to Your Allow List for PCoIP</div>
</th>
</tr>
<tr>
...
</tr>
</thead>
<tbody>
...
</tbody>
</table>
</div>
</div>
...
</html>
我需要检查 div
标签中的匹配值并获取与之关联的 table id
我是 python 的新手,任何关于如何解决这个问题或解决方案的建议都会很有帮助。
您可以使用 BeautifulSoup 获取 ID:
import requests
from bs4 import BeautifulSoup
url = 'http://docs.aws.amazon.com/workspaces/latest/adminguide/workspaces-port-requirements.html'
resp = requests.get(url)
soup = BeautifulSoup(resp.content, 'html.parser')
for t in soup.select('table[id]'):
if 'Domains and IP Addresses to Add to Your Allow List' in t.getText():
print(t.attrs['id'])
我相信您能想出如何将其合并到您的代码中。