Web Crawler Array error: "list index out of range"
Web Crawler Array error: "list index out of range"
我在 Python 方面不太擅长,但我正在为我参与游戏的公会建立网站,并且我正在使用爬虫从另一个中提取我们的一些成员数据网站(是的,我确实获得了这样做的许可)。我正在使用 python 3.7 的 beautiful soup 4。我收到错误消息:
Traceback (most recent call last):
File "/Users/UsersLaptop/Desktop/swgohScraper.py", line 21, in <module>
temp = members[count]
IndexError: list index out of range
我的代码在这里:
from requests import get
from bs4 import BeautifulSoup
# variables
count = 1
# lists to store data
names = []
gp = []
arenaRank = []
url = 'https://swgoh.gg/g/21284/gid-1-800-druidia/'
response = get(url)
soup = BeautifulSoup(response.text, 'html.parser')
type(soup)
members = soup.find_all('tr')
members.sort()
for users in members:
temp = members[count]
name = temp.td.a.strong.text
names.append(name)
count += 1
print(names)
我猜我收到此错误是因为成员中有 50 个成员,但第 50 个成员为空,如果数据为空,我需要停止追加数组,但是当我尝试放置时我的 for 循环下的 if 循环,例如:
if users.find('tr') is not None:
它没有解决问题。如果有人可以解释如何解决此问题以及该解决方案为何有效,我们将不胜感激。提前致谢!
当您使用 for in
循环时,您不需要计数变量。
for users in members:
name = users.td.a.strong.text
names.append(name)
首先更改count=0
,因为成员索引从0
开始
你的代码应该是这样的:
from requests import get
from bs4 import BeautifulSoup
# lists to store data
names = []
gp = []
arenaRank = []
url = 'https://swgoh.gg/g/21284/gid-1-800-druidia/'
response = get(url)
soup = BeautifulSoup(response.text, 'html.parser')
type(soup)
members = soup.find_all('tr')
members.sort()
for users in members:
name = users.td.a.strong.text
names.append(name)
print(names)
您可以将 count
更改为 0,因为 python 索引从 0 开始,但最好还是直接从迭代器开始 users
这将完成您尝试从代码中获取的内容,即尝试获取可以从代码中推断出的名称
from requests import get
from bs4 import BeautifulSoup
# variables
count = 1
# lists to store data
names = []
gp = []
arenaRank = []
url = 'https://swgoh.gg/g/21284/gid-1-800-druidia/'
response = get(url)
soup = BeautifulSoup(response.content, 'html.parser')
for users in soup.findAll('strong'):
if users.text.strip().encode("utf-8")!= '':
names.append(users.text.strip().encode("utf-8"))
print(names)
我在 Python 方面不太擅长,但我正在为我参与游戏的公会建立网站,并且我正在使用爬虫从另一个中提取我们的一些成员数据网站(是的,我确实获得了这样做的许可)。我正在使用 python 3.7 的 beautiful soup 4。我收到错误消息:
Traceback (most recent call last):
File "/Users/UsersLaptop/Desktop/swgohScraper.py", line 21, in <module>
temp = members[count]
IndexError: list index out of range
我的代码在这里:
from requests import get
from bs4 import BeautifulSoup
# variables
count = 1
# lists to store data
names = []
gp = []
arenaRank = []
url = 'https://swgoh.gg/g/21284/gid-1-800-druidia/'
response = get(url)
soup = BeautifulSoup(response.text, 'html.parser')
type(soup)
members = soup.find_all('tr')
members.sort()
for users in members:
temp = members[count]
name = temp.td.a.strong.text
names.append(name)
count += 1
print(names)
我猜我收到此错误是因为成员中有 50 个成员,但第 50 个成员为空,如果数据为空,我需要停止追加数组,但是当我尝试放置时我的 for 循环下的 if 循环,例如:
if users.find('tr') is not None:
它没有解决问题。如果有人可以解释如何解决此问题以及该解决方案为何有效,我们将不胜感激。提前致谢!
当您使用 for in
循环时,您不需要计数变量。
for users in members:
name = users.td.a.strong.text
names.append(name)
首先更改count=0
,因为成员索引从0
你的代码应该是这样的:
from requests import get
from bs4 import BeautifulSoup
# lists to store data
names = []
gp = []
arenaRank = []
url = 'https://swgoh.gg/g/21284/gid-1-800-druidia/'
response = get(url)
soup = BeautifulSoup(response.text, 'html.parser')
type(soup)
members = soup.find_all('tr')
members.sort()
for users in members:
name = users.td.a.strong.text
names.append(name)
print(names)
您可以将 count
更改为 0,因为 python 索引从 0 开始,但最好还是直接从迭代器开始 users
这将完成您尝试从代码中获取的内容,即尝试获取可以从代码中推断出的名称
from requests import get
from bs4 import BeautifulSoup
# variables
count = 1
# lists to store data
names = []
gp = []
arenaRank = []
url = 'https://swgoh.gg/g/21284/gid-1-800-druidia/'
response = get(url)
soup = BeautifulSoup(response.content, 'html.parser')
for users in soup.findAll('strong'):
if users.text.strip().encode("utf-8")!= '':
names.append(users.text.strip().encode("utf-8"))
print(names)