试图理解 Python Beautiful Soup 解析代码
Trying to understand Python Beautiful Soup parsing code
我遇到了以下代码,我发现它非常有用,但不确定如何解释其中的一段。
from pprint import pprint
import urllib2
from bs4 import BeautifulSoup
url = 'http://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2014'
soup = BeautifulSoup(urllib2.urlopen(url))
headers = ['Opening', 'Title', 'Genre', 'Director', 'Cast']
results = {}
for block in soup.select('div#mw-content-text > h3'):
title = block.find('span', class_='mw-headline').text
rows = block.find_next_sibling('table', class_='wikitable').find_all('tr')
results[title] = [{header: td.text for header, td in zip(headers, row.find_all('td'))}
for row in rows[1:]]
pprint(results)
除了这一段我都懂了:
results[title] = [{header: td.text for header, td in zip(headers, row.find_all('td'))}
for row in rows[1:]]
谁能解释一下这是做什么的以及我应该如何阅读它?谢谢!
那一行基本可以分解成这样
for count, row in enumerate(rows[1:]):
for header, td in zip(headers, row.find_all('td'):
results[title][count][header] = td.text
我遇到了以下代码,我发现它非常有用,但不确定如何解释其中的一段。
from pprint import pprint
import urllib2
from bs4 import BeautifulSoup
url = 'http://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2014'
soup = BeautifulSoup(urllib2.urlopen(url))
headers = ['Opening', 'Title', 'Genre', 'Director', 'Cast']
results = {}
for block in soup.select('div#mw-content-text > h3'):
title = block.find('span', class_='mw-headline').text
rows = block.find_next_sibling('table', class_='wikitable').find_all('tr')
results[title] = [{header: td.text for header, td in zip(headers, row.find_all('td'))}
for row in rows[1:]]
pprint(results)
除了这一段我都懂了:
results[title] = [{header: td.text for header, td in zip(headers, row.find_all('td'))}
for row in rows[1:]]
谁能解释一下这是做什么的以及我应该如何阅读它?谢谢!
那一行基本可以分解成这样
for count, row in enumerate(rows[1:]):
for header, td in zip(headers, row.find_all('td'):
results[title][count][header] = td.text