Python BeautifulSoup 在 Table 中混合匹配项目

Python BeautifulSoup Mix Matching items in Table

我正在尝试使用 Bs4 从 table 中 select 数据并存储到 csv 文件,但列是混合匹配的。我觉得我的 if 条件中的 HTML 语句是错误的。

def grab_daily_data(self): 
    url_template='http://boxofficemojo.com/movies/?page=daily&view=chart&id=%s.htm'
    #url=http://www.boxofficemojo.com/movies/?page=daily&view=chart&id=hungergames3.htm #Testing
    for val in self.mov_id:
        print 'parsing through: %s'%val
        url=url_template%val
        response = requests.get(url)
        soup = BeautifulSoup(response.content)

        alltables=soup.findAll("table", {"border":"0", "width":"95%"})
        in_mainbody=False
        i=0;counter=0;test_arr=[]; change=[]
        for table in alltables:
            rows=table.findAll('tr')
            for tr in rows:
                cols=tr.findAll('td')
                for td in cols:
                    test=td.text

                    if i>=17:
                        if counter%10==0:
                            print test
                            self.day_num.append(test)
                        counter+=1


                    i+=1

我的问题是该列向左移动 1,并且每 7 行再次移动一次。

示例输出: 而不是打印出来:

1 
2 
3 
4 
5 
6
7
8
9
10...

它打印出来:

Fri
Sat
Sun
Mon
Tue
Wed
Thu

8
9
10
11
12
13
14

问题是您没有找到合适的 table

依靠图表元素,得到next table sibling并找到里面的所有行:

from bs4 import BeautifulSoup
import requests

url = 'http://www.boxofficemojo.com/movies/?page=daily&view=chart&id=hungergames3.htm'

response = requests.get(url)
soup = BeautifulSoup(response.content)

for tr in soup.find('div', id='chart_container').find_next_sibling('table').find_all('tr')[1:]:
    print [td.text for td in tr('td')]

打印:

[u'Fri', u'Nov. 21, 2014', u'1', u',139,942', u'-', u'-', u'4,151', u',284', u',139,942', u'1']
[u'Sat', u'Nov. 22, 2014', u'1', u',905,873', u'-25.8%', u'-', u'4,151', u',854', u',045,815', u'2']
[u'Sun', u'Nov. 23, 2014', u'1', u',851,819', u'-36.8%', u'-', u'4,151', u',228', u'1,897,634', u'3']
[u'Mon', u'Nov. 24, 2014', u'1', u',978,318', u'-65.3%', u'-', u'4,151', u',163', u'0,875,952', u'4']
[u'Tue', u'Nov. 25, 2014', u'1', u',131,853', u'+35.1%', u'-', u'4,151', u',923', u'3,007,805', u'5']
[u'Wed', u'Nov. 26, 2014', u'1', u',620,517', u'+20.5%', u'-', u'4,151', u',522', u'7,628,322', u'6']
[u'Thu', u'Nov. 27, 2014', u'1', u',079,983', u'-24.2%', u'-', u'4,151', u',669', u'8,708,305', u'7']
[u'']
[u'Fri', u'Nov. 28, 2014', u'1', u',199,442', u'+118.4%', u'-56.1%', u'4,151', u',830', u'2,907,747', u'8']
[u'Sat', u'Nov. 29, 2014', u'1', u',992,225', u'-9.1%', u'-46.2%', u'4,151', u',298', u'4,899,972', u'9']
[u'Sun', u'Nov. 30, 2014', u'1', u',780,932', u'-51.0%', u'-58.3%', u'4,151', u',597', u'5,680,904', u'10']
[u'Mon', u'Dec. 1, 2014', u'1', u',635,435', u'-75.6%', u'-70.6%', u'4,151', u'5', u'8,316,339', u'11']
[u'Tue', u'Dec. 2, 2014', u'1', u',160,145', u'+19.9%', u'-74.0%', u'4,151', u'1', u'1,476,484', u'12']
[u'Wed', u'Dec. 3, 2014', u'1', u',332,453', u'-26.2%', u'-84.0%', u'4,151', u'2', u'3,808,937', u'13']
[u'Thu', u'Dec. 4, 2014', u'1', u',317,894', u'-0.6%', u'-79.1%', u'4,151', u'8', u'6,126,831', u'14']
...