使用 BeautifulSoup 解析日期

Question

我使用 BeautifulSoup 从 page 获取信息并且我获得了 link:

[<span class="field-content">Friday, September 11, 2015</span>]

使用命令

links = soup.find_all('div', attrs={'class':'views-row'})
link = links[0]
link.find('span', attrs={'class':'views-field views-field-created'}).select('span')

但我需要解析日期。我怎样才能从中得到 Friday, September 11, 2015？

Answer 1

找到了，是link.find('span', attrs={'class':'views-field views-field-created'}).select_one('span').text

Answer 2

回答问题中的示例 - 从结果集中选择最后一个元素：

link.find('span', attrs={'class':'views-field views-field-created'}).select('span')[-1].text

或更短：

link.find_all("span")[-1].text

但如果您想提取所有信息并存储为结构化数据，使用 stripped_strings 会是更好的方法。

例子

import requests
from bs4 import BeautifulSoup

url = 'https://web.archive.org/web/20150913224145/http://www.newyorksocialdiary.com/party-pictures'

res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")

data = []

for item in soup.select('.view-content div'):
    c = list(item.stripped_strings)
    data.append({
        'title':c[0],
        'date':c[-1],
        'url':item.a['href'].split('/',3)[-1]
    })

print(data)

输出

[{'title': 'Kicks offs, sing offs, and pro ams', 'date': 'Friday, September 11, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/kicks-offs-sing-offs-and-pro-ams'}, {'title': 'Grand Finale of the Hampton Classic Horse Show', 'date': 'Tuesday, September 1, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/grand-finale-of-the-hampton-classic-horse-show'}, {'title': 'Riders, Spectators, Horses, and More ...', 'date': 'Wednesday, August 26, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/riders-spectators-horses-and-more'}, {'title': 'Artist and Writers (and Designers)', 'date': 'Thursday, August 20, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/artist-and-writers-and-designers'}, {'title': 'Garden Parties Kickoffs  and Summer Benefits', 'date': 'Monday, August 17, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/garden-parties-kickoffs-and-summer-benefits'}, {'title': 'The Summer Set', 'date': 'Wednesday, August 12, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/the-summer-set'}, {'title': 'Midsummer Parties', 'date': 'Wednesday, August 5, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/midsummer-parties'}, {'title': 'The Watermill Center and The Parrish', 'date': 'Wednesday, July 29, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/the-watermill-center-and-the-parrish'}, {'title': 'Unconditional Love', 'date': 'Thursday, July 23, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/unconditional-love'}, {'title': "Women's Health, Boys & Girls, Cancer Research, and Just Plain Summer Fun", 'date': 'Friday, July 17, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/womens-health-boys-girls-cancer-research-and-just-plain-summer-fun'},...]

使用 BeautifulSoup 解析日期

Parsing date with BeautifulSoup

python

beautifulsoup

html-parsing

例子

输出