访问 python for 循环值
Access python for loop values
我正在对 Python 进行一些试验,并尝试构建一个抓取工具。我已有的代码打印在下面。
import requests
from bs4 import BeautifulSoup
import csv
url = "http://www.grammy.com/nominees/search"
r = requests.get(url)
soup = BeautifulSoup(r.content)
g_data = soup.find_all("div", {"class": "view-content"})
f = csv.writer(open("file.csv", "w"))
f.writerow(["Year", "Category", "Title", "Winner"])
for item in g_data:
for year in item.find_all("td", {"class": "views-field-year"}):
year = year.contents[0]
for category in item.find_all("td", {"class": "views-field-category-code"}):
category = category.contents[0]
for title in item.find_all("td", {"class": "views-field-field-nominee-work"}):
title = title.contents[0]
for winner in item.find_all("td", {"class": "views-field-field-nominee-extended"}):
winner = winner.contents[0]
f.writerow([year, category, title, winner])
出于某种原因,CSV 文件只有 1 行,随机的。我如何才能访问其 for
范围之外的所有这些值?
你的写调用在你的循环之外,所以你只写了一行(最后一行)。缩进它,它应该按预期工作:
for item in g_data:
for year in item.find_all("td", {"class": "views-field-year"}):
year = year.contents[0]
for category in item.find_all("td", {"class": "views-field-category-code"}):
category = category.contents[0]
for title in item.find_all("td", {"class": "views-field-field-nominee-work"}):
title = title.contents[0]
for winner in item.find_all("td", {"class": "views-field-field-nominee-extended"}):
winner = winner.contents[0]
f.writerow([year, category, title, winner])
如果您不熟悉 Python,代码块由缩进定义。
不仅是你最后的writerow()
缩进不正确(应该在循环体下面)。此外,您需要遍历 tr
元素(代表包含数据的所需 table
中的每一行),为循环中找到的每个 tr
获取 td
元素。
我也会避免在循环中检查 td
元素的 class
属性值,而只是通过索引获取它们——换句话说,找到每个 td
元素的所有 tr
并得到 text
.
修复和改进版本(仅 2 行代码):
for item in soup.select("div.view-content table tr")[1:]:
f.writerow([td.get_text(strip=True).encode("utf-8") for td in item.find_all("td")])
运行代码后file.csv
的内容:
Year,Category,Title,Winner
2014,Record Of The Year,Stay With Me (Darkchild Version),"Sam Smith, artist. Steve Fitzmaurice, Rodney Jerkins & Jimmy Napes, producers. Matthew Champlin, Steve Fitzmaurice, Jimmy Napes & Steve Price, engineers/mixers. Tom Coyne, mastering engineer."
2014,Album Of The Year,Morning Phase,"Beck Hansen, producer; Tom Elmhirst, David Greenbaum, Cole Marsden Greif-Neill, Florian Lagatta, Robbie Nelson, Darrell Thorp, Cassidy Turbin & Joe Visciano, engineers/mixers; Bob Ludwig, mastering engineer."
2014,Song Of The Year,Stay With Me (Darkchild Version),"James Napier, William Phillips &Sam Smith, songwriters."
...
2014,Best Rap Song,I,"K. Duckworth, Ronald Isley & C. Smith, songwriters."
2014,Best Rap Album,The Marshall Mathers LP2,"Eminem, artist. Tony Campana, Joe Strange & Mike Strange, engineers/mixers."
我正在对 Python 进行一些试验,并尝试构建一个抓取工具。我已有的代码打印在下面。
import requests
from bs4 import BeautifulSoup
import csv
url = "http://www.grammy.com/nominees/search"
r = requests.get(url)
soup = BeautifulSoup(r.content)
g_data = soup.find_all("div", {"class": "view-content"})
f = csv.writer(open("file.csv", "w"))
f.writerow(["Year", "Category", "Title", "Winner"])
for item in g_data:
for year in item.find_all("td", {"class": "views-field-year"}):
year = year.contents[0]
for category in item.find_all("td", {"class": "views-field-category-code"}):
category = category.contents[0]
for title in item.find_all("td", {"class": "views-field-field-nominee-work"}):
title = title.contents[0]
for winner in item.find_all("td", {"class": "views-field-field-nominee-extended"}):
winner = winner.contents[0]
f.writerow([year, category, title, winner])
出于某种原因,CSV 文件只有 1 行,随机的。我如何才能访问其 for
范围之外的所有这些值?
你的写调用在你的循环之外,所以你只写了一行(最后一行)。缩进它,它应该按预期工作:
for item in g_data:
for year in item.find_all("td", {"class": "views-field-year"}):
year = year.contents[0]
for category in item.find_all("td", {"class": "views-field-category-code"}):
category = category.contents[0]
for title in item.find_all("td", {"class": "views-field-field-nominee-work"}):
title = title.contents[0]
for winner in item.find_all("td", {"class": "views-field-field-nominee-extended"}):
winner = winner.contents[0]
f.writerow([year, category, title, winner])
如果您不熟悉 Python,代码块由缩进定义。
不仅是你最后的writerow()
缩进不正确(应该在循环体下面)。此外,您需要遍历 tr
元素(代表包含数据的所需 table
中的每一行),为循环中找到的每个 tr
获取 td
元素。
我也会避免在循环中检查 td
元素的 class
属性值,而只是通过索引获取它们——换句话说,找到每个 td
元素的所有 tr
并得到 text
.
修复和改进版本(仅 2 行代码):
for item in soup.select("div.view-content table tr")[1:]:
f.writerow([td.get_text(strip=True).encode("utf-8") for td in item.find_all("td")])
运行代码后file.csv
的内容:
Year,Category,Title,Winner
2014,Record Of The Year,Stay With Me (Darkchild Version),"Sam Smith, artist. Steve Fitzmaurice, Rodney Jerkins & Jimmy Napes, producers. Matthew Champlin, Steve Fitzmaurice, Jimmy Napes & Steve Price, engineers/mixers. Tom Coyne, mastering engineer."
2014,Album Of The Year,Morning Phase,"Beck Hansen, producer; Tom Elmhirst, David Greenbaum, Cole Marsden Greif-Neill, Florian Lagatta, Robbie Nelson, Darrell Thorp, Cassidy Turbin & Joe Visciano, engineers/mixers; Bob Ludwig, mastering engineer."
2014,Song Of The Year,Stay With Me (Darkchild Version),"James Napier, William Phillips &Sam Smith, songwriters."
...
2014,Best Rap Song,I,"K. Duckworth, Ronald Isley & C. Smith, songwriters."
2014,Best Rap Album,The Marshall Mathers LP2,"Eminem, artist. Tony Campana, Joe Strange & Mike Strange, engineers/mixers."