需要帮助弄清楚如何循环索引
Need help figuring out how to loop through indice
我正在从事一个围绕抓取大量数据的项目。我现在正在编写一个相当长的脚本,但是 运行 我的 for 循环遇到了问题。
我正在尝试从第 9 行中抓取信息 table。我试图设置一个 for 循环,以便它从每一行中抓取相同的信息。为了访问第一行,我将 table 拆分为一个列表。第一行从第三个索引开始。
这是我的代码:
当我 运行 它时,我得到一个 "AttributeError" 在行 "Aa" 上。错误显示,“'NoneType' 对象没有属性 'text'”
当我将该行代码单独输入控制台时,并没有发生这种情况,我得到了所需的文本。当我取出 for 循环时,我能够抓取第一个 indaplaybox。
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url ='Myurl/=' + page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
boxes = page_soup.findAll("table",{"class":"TableTable tableBody"})
box = boxes[0]
playboxes = box.find_all('tr')
indaplaybox = playboxes[3]
filename = "QBS.csv"
f = open(filename, "a")
headers= "Aa, Ab, Ac, Ad\n"
f.write(headers)
for indaplaybox in playboxes:
Aa = indaplaybox.find('td', attrs = {'style' : 'font-weight: bold;'}).text
c = indaplaybox.find('td', attrs = {'class' : 'tablePlayName'})
cl = c.text.split()
Ab = cl[0] + " " + cl[1]
Ac = cl[2]
Ad = indaplaybox.div.a.text
print("Aa:" + Aa)
print("Ab:" + Ab)
print("Ac:" + Ac)
print("Ad:" + Ad)
with open (filename, "a") as myfile:
myfile.write(Aa + "," + Ab + "," + Ac.replace(",", "|") + "," + Ad + "\n")
f.close()
我想遍历播放框索引 3-11。
我不太熟悉索引,所以尝试做类似的事情:
p = [str(i) for i in range (3,12)]
indaplaybox = playboxes[p]
for indaplaybox in playboxes:
rest of code
但这行不通,因为对于大多数人来说可能显而易见的是列表索引必须是整数。
我真的需要一些帮助来思考如何顺利地获得这个 for 循环 运行ning。谢谢!
p = [str(i) for i in range (3,12)]
for i in p:
indaplaybox = playboxes[i]
...
rest of the code
你可以这样做:
方法一:
# p has all the values from playboxes at these indexes
p = [playboxes[i] for i in range(3,12)]
# now simple loop
for indaplaybox in p:
......
方法二:
for indaplaybox in playboxes[3:12]:
....
我正在从事一个围绕抓取大量数据的项目。我现在正在编写一个相当长的脚本,但是 运行 我的 for 循环遇到了问题。
我正在尝试从第 9 行中抓取信息 table。我试图设置一个 for 循环,以便它从每一行中抓取相同的信息。为了访问第一行,我将 table 拆分为一个列表。第一行从第三个索引开始。
这是我的代码:
当我 运行 它时,我得到一个 "AttributeError" 在行 "Aa" 上。错误显示,“'NoneType' 对象没有属性 'text'”
当我将该行代码单独输入控制台时,并没有发生这种情况,我得到了所需的文本。当我取出 for 循环时,我能够抓取第一个 indaplaybox。
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url ='Myurl/=' + page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
boxes = page_soup.findAll("table",{"class":"TableTable tableBody"})
box = boxes[0]
playboxes = box.find_all('tr')
indaplaybox = playboxes[3]
filename = "QBS.csv"
f = open(filename, "a")
headers= "Aa, Ab, Ac, Ad\n"
f.write(headers)
for indaplaybox in playboxes:
Aa = indaplaybox.find('td', attrs = {'style' : 'font-weight: bold;'}).text
c = indaplaybox.find('td', attrs = {'class' : 'tablePlayName'})
cl = c.text.split()
Ab = cl[0] + " " + cl[1]
Ac = cl[2]
Ad = indaplaybox.div.a.text
print("Aa:" + Aa)
print("Ab:" + Ab)
print("Ac:" + Ac)
print("Ad:" + Ad)
with open (filename, "a") as myfile:
myfile.write(Aa + "," + Ab + "," + Ac.replace(",", "|") + "," + Ad + "\n")
f.close()
我想遍历播放框索引 3-11。
我不太熟悉索引,所以尝试做类似的事情:
p = [str(i) for i in range (3,12)]
indaplaybox = playboxes[p]
for indaplaybox in playboxes:
rest of code
但这行不通,因为对于大多数人来说可能显而易见的是列表索引必须是整数。
我真的需要一些帮助来思考如何顺利地获得这个 for 循环 运行ning。谢谢!
p = [str(i) for i in range (3,12)]
for i in p:
indaplaybox = playboxes[i]
...
rest of the code
你可以这样做:
方法一:
# p has all the values from playboxes at these indexes
p = [playboxes[i] for i in range(3,12)]
# now simple loop
for indaplaybox in p:
......
方法二:
for indaplaybox in playboxes[3:12]:
....