BeautifulSoup findAll HTML class 具有多个变量 class 输入

BeautifulSoup findAll HTML class with multiple variable class inputs

我有以下代码,它使用 class "odd" 或 "even" 抓取网站的 div。我想让 "odd" 和 "even" 成为我的函数接受的参数,这样我也可以添加其他 div。这是我的代码:

#
# Imports
#

import urllib2
from bs4 import BeautifulSoup
import re
import os
from pprint import pprint

#
# library
#

def get_soup(url):
    page = urllib2.urlopen(url)
    contents = page.read()
    soup = BeautifulSoup(contents, "html.parser")
    body = soup.findAll("tr", ["even", "odd"])
    string_list = str([i for i in body])
    return string_list


def save_to_file(path, soup):
    with open(path, 'w') as fhandle:
        fhandle.write(soup)


#
# script
#

def main():
    url = r'URL GOES HERE'
    path = os.path.join('PATH GOES HERE')
    the_soup = get_soup(url)
    save_to_file(path, the_soup)



if __name__ == '__main__':
    main()

我想将 *args 合并到代码中,这样 get_soup function 看起来像这样:

def get_soup(url, *args):
    page = urllib2.urlopen(url)
    contents = page.read()
    soup = BeautifulSoup(contents, "html.parser")
    body = soup.findAll("tr", [args])
    string_list = str([i for i in body])
    return string_list

def main():
    url = r'URL GOES HERE'
    path = os.path.join('PATH GOES HERE')
    the_soup = get_soup(url, "odd", "even")
    save_to_file(path, the_soup)

不幸的是,这不起作用。想法?

不要把 args 放在列表中,args 已经是一个 tuple 所以只需传递:

body = soup.findAll("tr", args)

如果你 [args],你最终会得到类似 [("odd","even")] 的结果。

而且 str([i for i in body]) 没有任何实际意义,它与 str(body) 一样,但我不明白这种格式有什么用。