将结果保存到 for 循环列表?
Saving results to list from a for loop?
url = 'http://www.millercenter.org/president/speeches'
conn = urllib2.urlopen(url)
html = conn.read()
miller_center_soup = BeautifulSoup(html)
links = miller_center_soup.find_all('a')
for tag in links:
link = tag.get('href',None)
if link is not None:
print link
这是我的一些输出:
/president/washington/speeches/speech-3939
/president/washington/speeches/speech-3939
/president/washington/speeches/speech-3461
https://www.facebook.com/millercenter
https://twitter.com/miller_center
https://www.flickr.com/photos/miller_center
https://www.youtube.com/user/MCamericanpresident
http://forms.hoosonline.virginia.edu/s/1535/16-uva/index.aspx?sid=1535&gid=16&pgid=9982&cid=17637
mailto:mcpa-webmaster@virginia.edu
我正在尝试通过网络抓取网站 millercenter.org/president/speeches
上的所有总统演讲,但无法保存我将从中抓取语音数据的适当语音链接。更明确地说,我需要 George Washington 的演讲,可在 http://www.millercenter.org/president/washington/speeches/speech-3461
访问 - 我只需要能够访问 url。我正在考虑将所有演讲的所有 url 存储在一个列表中,然后编写一个 for
循环来抓取和清理所有数据。
将其转换为列表理解:
linklist = [tag.get('href') for tag in links if tag.get('href') is not None]
略微优化:
linklist = [href for href in (tag.get('href') for tag in links) if href is not None]
如果您不满意列表理解或者您不想使用它,您可以创建一个列表并附加到它:
all_links = []
for tag in links:
link = tag.get('href',None)
if link is not None:
all_links.append(link)
url = 'http://www.millercenter.org/president/speeches'
conn = urllib2.urlopen(url)
html = conn.read()
miller_center_soup = BeautifulSoup(html)
links = miller_center_soup.find_all('a')
for tag in links:
link = tag.get('href',None)
if link is not None:
print link
这是我的一些输出:
/president/washington/speeches/speech-3939
/president/washington/speeches/speech-3939
/president/washington/speeches/speech-3461
https://www.facebook.com/millercenter
https://twitter.com/miller_center
https://www.flickr.com/photos/miller_center
https://www.youtube.com/user/MCamericanpresident
http://forms.hoosonline.virginia.edu/s/1535/16-uva/index.aspx?sid=1535&gid=16&pgid=9982&cid=17637
mailto:mcpa-webmaster@virginia.edu
我正在尝试通过网络抓取网站 millercenter.org/president/speeches
上的所有总统演讲,但无法保存我将从中抓取语音数据的适当语音链接。更明确地说,我需要 George Washington 的演讲,可在 http://www.millercenter.org/president/washington/speeches/speech-3461
访问 - 我只需要能够访问 url。我正在考虑将所有演讲的所有 url 存储在一个列表中,然后编写一个 for
循环来抓取和清理所有数据。
将其转换为列表理解:
linklist = [tag.get('href') for tag in links if tag.get('href') is not None]
略微优化:
linklist = [href for href in (tag.get('href') for tag in links) if href is not None]
如果您不满意列表理解或者您不想使用它,您可以创建一个列表并附加到它:
all_links = []
for tag in links:
link = tag.get('href',None)
if link is not None:
all_links.append(link)