BeautifulSoup findAll HTML class 具有多个变量 class 输入
BeautifulSoup findAll HTML class with multiple variable class inputs
我有以下代码,它使用 class "odd" 或 "even" 抓取网站的 div。我想让 "odd" 和 "even" 成为我的函数接受的参数,这样我也可以添加其他 div。这是我的代码:
#
# Imports
#
import urllib2
from bs4 import BeautifulSoup
import re
import os
from pprint import pprint
#
# library
#
def get_soup(url):
page = urllib2.urlopen(url)
contents = page.read()
soup = BeautifulSoup(contents, "html.parser")
body = soup.findAll("tr", ["even", "odd"])
string_list = str([i for i in body])
return string_list
def save_to_file(path, soup):
with open(path, 'w') as fhandle:
fhandle.write(soup)
#
# script
#
def main():
url = r'URL GOES HERE'
path = os.path.join('PATH GOES HERE')
the_soup = get_soup(url)
save_to_file(path, the_soup)
if __name__ == '__main__':
main()
我想将 *args
合并到代码中,这样 get_soup function
看起来像这样:
def get_soup(url, *args):
page = urllib2.urlopen(url)
contents = page.read()
soup = BeautifulSoup(contents, "html.parser")
body = soup.findAll("tr", [args])
string_list = str([i for i in body])
return string_list
def main():
url = r'URL GOES HERE'
path = os.path.join('PATH GOES HERE')
the_soup = get_soup(url, "odd", "even")
save_to_file(path, the_soup)
不幸的是,这不起作用。想法?
不要把 args 放在列表中,args 已经是一个 tuple 所以只需传递:
body = soup.findAll("tr", args)
如果你 [args]
,你最终会得到类似 [("odd","even")]
的结果。
而且 str([i for i in body])
没有任何实际意义,它与 str(body)
一样,但我不明白这种格式有什么用。
我有以下代码,它使用 class "odd" 或 "even" 抓取网站的 div。我想让 "odd" 和 "even" 成为我的函数接受的参数,这样我也可以添加其他 div。这是我的代码:
#
# Imports
#
import urllib2
from bs4 import BeautifulSoup
import re
import os
from pprint import pprint
#
# library
#
def get_soup(url):
page = urllib2.urlopen(url)
contents = page.read()
soup = BeautifulSoup(contents, "html.parser")
body = soup.findAll("tr", ["even", "odd"])
string_list = str([i for i in body])
return string_list
def save_to_file(path, soup):
with open(path, 'w') as fhandle:
fhandle.write(soup)
#
# script
#
def main():
url = r'URL GOES HERE'
path = os.path.join('PATH GOES HERE')
the_soup = get_soup(url)
save_to_file(path, the_soup)
if __name__ == '__main__':
main()
我想将 *args
合并到代码中,这样 get_soup function
看起来像这样:
def get_soup(url, *args):
page = urllib2.urlopen(url)
contents = page.read()
soup = BeautifulSoup(contents, "html.parser")
body = soup.findAll("tr", [args])
string_list = str([i for i in body])
return string_list
def main():
url = r'URL GOES HERE'
path = os.path.join('PATH GOES HERE')
the_soup = get_soup(url, "odd", "even")
save_to_file(path, the_soup)
不幸的是,这不起作用。想法?
不要把 args 放在列表中,args 已经是一个 tuple 所以只需传递:
body = soup.findAll("tr", args)
如果你 [args]
,你最终会得到类似 [("odd","even")]
的结果。
而且 str([i for i in body])
没有任何实际意义,它与 str(body)
一样,但我不明白这种格式有什么用。