从列表中删除不在 'speeches' 中的项目?
Remove items from list not in 'speeches'?
url = 'http://www.millercenter.org/president/speeches'
conn = urllib2.urlopen(url)
html = conn.read()
miller_center_soup = BeautifulSoup(html)
links = miller_center_soup.find_all('a')
linklist = [tag.get('href') for tag in links if tag.get('href') is not None]
linklist = str(linklist)
end_of_links = [line for line in linklist if '/events/' in line]
print end_of_links
这是我输出的一小段(保存在 Python 列表中)。
['/events/2015/one-nation-under-god-how-corporate-america-invented-christian-america',
'/events/2015/a-conversation-with-bernie-sanders', '#reagan', '#gwbush', '#obama',
'#top', '/president/obama/speeches/speech-4427', president/obama/speeches/speech-4430', ...]
我想删除列表中不包含 speeches
的所有项目。我试过 filter()
并只是创建了另一个列表理解,但这还没有奏效。我不知道为什么 end_of_links
变量不起作用 - 至少对我来说这似乎很直观。
必须保留做的那些包括'speeches':
link_list = ['/events/2015/one-nation-under-god-how-corporate-america-invented-christian-america',
'/events/2015/a-conversation-with-bernie-sanders', '#reagan', '#gwbush', '#obama',
'#top', '/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']
speech_list = [_ for _ in link_list if 'speeches' in _]
这是我的终端会话,在 Python2.7
>>> link_list = ['/events/2015/one-nation-under-god-how-corporate-america-invented-christian-america',
... '/events/2015/a-conversation-with-bernie-sanders', '#reagan', '#gwbush', '#obama',
... '#top', '/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']
>>> speech_list = [_ for _ in link_list if 'speeches' in _]
>>> speech_list
['/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']
>>>
li = ['/events/2015/one-nation-under-god-how-corporate-america-invented-christian-america',
'/events/2015/a-conversation-with-bernie-sanders', '#reagan', '#gwbush', '#obama',
'#top', '/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']
重新导入
li = [ x for x in li if re.search('speeches',x)]
打印(li)
['/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']
url = 'http://www.millercenter.org/president/speeches'
conn = urllib2.urlopen(url)
html = conn.read()
miller_center_soup = BeautifulSoup(html)
links = miller_center_soup.find_all('a')
linklist = [tag.get('href') for tag in links if tag.get('href') is not None]
linklist = str(linklist)
end_of_links = [line for line in linklist if '/events/' in line]
print end_of_links
这是我输出的一小段(保存在 Python 列表中)。
['/events/2015/one-nation-under-god-how-corporate-america-invented-christian-america',
'/events/2015/a-conversation-with-bernie-sanders', '#reagan', '#gwbush', '#obama',
'#top', '/president/obama/speeches/speech-4427', president/obama/speeches/speech-4430', ...]
我想删除列表中不包含 speeches
的所有项目。我试过 filter()
并只是创建了另一个列表理解,但这还没有奏效。我不知道为什么 end_of_links
变量不起作用 - 至少对我来说这似乎很直观。
必须保留做的那些包括'speeches':
link_list = ['/events/2015/one-nation-under-god-how-corporate-america-invented-christian-america',
'/events/2015/a-conversation-with-bernie-sanders', '#reagan', '#gwbush', '#obama',
'#top', '/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']
speech_list = [_ for _ in link_list if 'speeches' in _]
这是我的终端会话,在 Python2.7
>>> link_list = ['/events/2015/one-nation-under-god-how-corporate-america-invented-christian-america',
... '/events/2015/a-conversation-with-bernie-sanders', '#reagan', '#gwbush', '#obama',
... '#top', '/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']
>>> speech_list = [_ for _ in link_list if 'speeches' in _]
>>> speech_list
['/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']
>>>
li = ['/events/2015/one-nation-under-god-how-corporate-america-invented-christian-america', '/events/2015/a-conversation-with-bernie-sanders', '#reagan', '#gwbush', '#obama', '#top', '/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']
重新导入
li = [ x for x in li if re.search('speeches',x)]
打印(li)
['/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']