美汤提取table数据
beautiful soup extracting table data
我是 bs4 的菜鸟。我阅读了一些教程并尝试了一些简单的示例。
我想从表格中提取数据,但我做不到。
这是 html_source:
<table class="tborder" cellpadding="5" cellspacing="0" border="0" width="100%" align="center" style="margin:5px 0px 5px 0px" id="post45894054">
<tr>
<td>
<div class="alt2" style="margin:5px 0px 5px 0px; padding:5px; border:2px groove">
<div class="smallfont"><em>
<br />
Good news today.
</em></div>
</div>
</td>
</tr>
</table>
我想提取 'Good news today'
我试过那个代码,但它没有像我预期的那样工作:
from bs4 import BeautifulSoup
import urllib2
import re
base_url = "some url"
html_page = urllib2.urlopen(base_url)
soup = BeautifulSoup(html_page)
print soup
tables = soup.select("table .alt2 .smallfont br")
print tables
from bs4 import BeautifulSoup
soup = BeautifulSoup("""<table class="tborder" cellpadding="5" cellspacing="0" border="0" width="100%" align="center" style="margin:5px 0px 5px 0px" id="post45894054">
<tr>
<td>
<div class="alt2" style="margin:5px 0px 5px 0px; padding:5px; border:2px groove">
<div class="smallfont"><em>
<br />
Good news today.
</em></div>
</div>
</td>
</tr>
</table> """)
print(soup.find("table",attrs={"class":"tborder"}).text.strip())
Good news today.
print(soup.find(attrs={"class":"smallfont"}).text.strip())
Good news today.
我是 bs4 的菜鸟。我阅读了一些教程并尝试了一些简单的示例。 我想从表格中提取数据,但我做不到。
这是 html_source:
<table class="tborder" cellpadding="5" cellspacing="0" border="0" width="100%" align="center" style="margin:5px 0px 5px 0px" id="post45894054">
<tr>
<td>
<div class="alt2" style="margin:5px 0px 5px 0px; padding:5px; border:2px groove">
<div class="smallfont"><em>
<br />
Good news today.
</em></div>
</div>
</td>
</tr>
</table>
我想提取 'Good news today'
我试过那个代码,但它没有像我预期的那样工作:
from bs4 import BeautifulSoup
import urllib2
import re
base_url = "some url"
html_page = urllib2.urlopen(base_url)
soup = BeautifulSoup(html_page)
print soup
tables = soup.select("table .alt2 .smallfont br")
print tables
from bs4 import BeautifulSoup
soup = BeautifulSoup("""<table class="tborder" cellpadding="5" cellspacing="0" border="0" width="100%" align="center" style="margin:5px 0px 5px 0px" id="post45894054">
<tr>
<td>
<div class="alt2" style="margin:5px 0px 5px 0px; padding:5px; border:2px groove">
<div class="smallfont"><em>
<br />
Good news today.
</em></div>
</div>
</td>
</tr>
</table> """)
print(soup.find("table",attrs={"class":"tborder"}).text.strip())
Good news today.
print(soup.find(attrs={"class":"smallfont"}).text.strip())
Good news today.