python 2.7 BeautifulSoup 查找包含特定字符串的 table
python 2.7 BeautifulSoup find the table containing a particular string
在 BeautifulSoup 文档中搜索字符串后,如何获取包含该字符串的 table?我有一个适用于我熟悉的 table 的解决方案:
我的代码如下:
import mechanize
from bs4 import BeautifulSoup
sitemap_url = "https://www.rbi.org.in/scripts/sitemap.aspx"
br = mechanize.Browser()
br.addheaders = [('User-agent',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1'),
('accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')]
response = br.open(sitemap_url)
text = response.read()
br.close()
soup = BeautifulSoup(text, 'lxml')
# Find the table containing the financial intermediaries.
# First I find "Financial Intermediaries" in soup.
fin_str = soup.find(text="Financial Intermediaries")
# Next I step out through the parents
# until it turns out that I have found the table.
fin_tbl = fin_str.parent.parent.parent.parent
这样做的问题是每次我走出文档时都必须检查结果。在看到 table?
之前,如何添加 .parent
将以下代码附加到程序中:
# The first tag around the string is the parent.
fn_in = fin_str.parent
# Step out through the parents.
def step_out(i):
if isinstance(i, element.NavigableString):
pass
return i.parent
# Continue until 'table' is in the name of the tag.
while not 'table' in fn_in.name:
fn_in = step_out(fn_in)
在 BeautifulSoup 文档中搜索字符串后,如何获取包含该字符串的 table?我有一个适用于我熟悉的 table 的解决方案:
我的代码如下:
import mechanize
from bs4 import BeautifulSoup
sitemap_url = "https://www.rbi.org.in/scripts/sitemap.aspx"
br = mechanize.Browser()
br.addheaders = [('User-agent',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1'),
('accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')]
response = br.open(sitemap_url)
text = response.read()
br.close()
soup = BeautifulSoup(text, 'lxml')
# Find the table containing the financial intermediaries.
# First I find "Financial Intermediaries" in soup.
fin_str = soup.find(text="Financial Intermediaries")
# Next I step out through the parents
# until it turns out that I have found the table.
fin_tbl = fin_str.parent.parent.parent.parent
这样做的问题是每次我走出文档时都必须检查结果。在看到 table?
之前,如何添加 .parent将以下代码附加到程序中:
# The first tag around the string is the parent.
fn_in = fin_str.parent
# Step out through the parents.
def step_out(i):
if isinstance(i, element.NavigableString):
pass
return i.parent
# Continue until 'table' is in the name of the tag.
while not 'table' in fn_in.name:
fn_in = step_out(fn_in)