尝试除了多次检查的更好方法

Better way to try-except multiple checks

假设我有一些(简化的)BeautifulSoup 代码,将数据提取到字典中:

tournament_info = soup.find_all('li')

stats['Date'] = tournament_info[0].text
stats['Location'] = tournament_info[1].text
stats['Prize'] = tournament_info[3].text.split(':')[1].strip()

在初始find_all returns异常的情况下,我希望所有字典条目都是'None'。在任何单个字典分配返回异常的情况下,我也想要 'None'。

除了像下面这样可怕的东西之外,还有什么好的写法吗?

try:
    tournament_info = soup.find_all('li')
except:
    m_stats['Date'] = 'None'
    m_stats['Location'] = 'None'
    m_stats['Prize'] = 'None'

try:
    m_stats['Date'] = tournament_info[0].text
except:
    m_stats['Date'] = 'None'
try:
    m_stats['Location'] = tournament_info[1].text
except:
    m_stats['Location'] = 'None'
try:
    m_stats['Prize'] = tournament_info[3].text.split(':')[1].strip()
except:
    m_stats['Prize'] = 'None'

以下是我可以为您的代码提供的建议:

info = soup.find_all('li')
if not info:
    m_stats = dict.fromkeys(m_stats, None)
    return

mappings = {
    'Date': 0,
    'Location': 1,
    'Prize': 3
}
for key in mappings:
    value = None
    try:
        value = info[mappings[key]].text
        if mappings[key] == 3:
            value = value.split(':')[1].strip()
    except IndexError:
        pass
    m_stats[key] = value

或者,您可以创建一个函数来为您处理异常:

def get_value(idx):
    value = None
    try:
        value = info[idx].text
    except IndexError:
        pass
    return value

m_stats['Date'] = get_value(0)
m_stats['Location'] = get_value(1)
m_stats['Prize'] = get_value(3)
if m_stats['Prize']:
    m_stats['Prize'].split(':')[1].strip()

创建自己的class

class Stats(dict):

    tournament_info = []

    def __init__(self, tournament_info, **kwargs):
        super(Stats, self).__init__(**kwargs)
        self.tournament_info = tournament_info
        self['Date'] = self.get_tournament_info_text(0)
        self['Location'] = self.get_tournament_info_text(1)
        prize = self.get_tournament_info_text(2)
        if prize is not None:
            prize = prize.split(':')[1].strip()
        self['Prize'] = prize

    def get_tournament_info_text(self, index):
        try:
            return self.tournament_info[index]['text']
        except:
            return None

tournament_info = [
    {
        'text': 'aaa'
    },
    {},
    {
        'text': 'bbb:ccc '
    }
]

m_stats = Stats(tournament_info)
print m_stats

我寻求的解决方案是创建一个空白模板字典(实际上是 JSON),所有键都设置为 'None'。

每次抓取页面时,m_stats 首先使用这个空白字典(从 JSON 加载)进行初始化。如果发生异常,它只是简单地传递(带有一些日志记录),并且该值保留为 'None'。这样就不需要每次都显式分配 'None' 。

不确定将此标记为 "answer" 是否正确,因为它非常符合我的需求,但我还是这么做了。