Web Crawler Array error: "list index out of range"

Web Crawler Array error: "list index out of range"

我在 Python 方面不太擅长,但我正在为我参与游戏的公会建立网站,并且我正在使用爬虫从另一个中提取我们的一些成员数据网站(是的,我确实获得了这样做的许可)。我正在使用 python 3.7 的 beautiful soup 4。我收到错误消息:

Traceback (most recent call last):
  File "/Users/UsersLaptop/Desktop/swgohScraper.py", line 21, in <module>
    temp = members[count]
IndexError: list index out of range

我的代码在这里:

from requests import get
from bs4 import BeautifulSoup
# variables
count = 1

# lists to store data
names = []
gp = []
arenaRank = []

url = 'https://swgoh.gg/g/21284/gid-1-800-druidia/'
response = get(url)

soup = BeautifulSoup(response.text, 'html.parser')
type(soup)

members = soup.find_all('tr')
members.sort()

for users in members:
    temp = members[count]
    name = temp.td.a.strong.text
    names.append(name)
    count += 1

print(names)

我猜我收到此错误是因为成员中有 50 个成员,但第 50 个成员为空,如果数据为空,我需要停止追加数组,但是当我尝试放置时我的 for 循环下的 if 循环,例如:

if users.find('tr') is not None:

它没有解决问题。如果有人可以解释如何解决此问题以及该解决方案为何有效,我们将不胜感激。提前致谢!

当您使用 for in 循环时,您不需要计数变量。

for users in members:
    name = users.td.a.strong.text
    names.append(name)

首先更改count=0,因为成员索引从0

开始

你的代码应该是这样的:

from requests import get
from bs4 import BeautifulSoup    
# lists to store data
names = []
gp = []
arenaRank = []

url = 'https://swgoh.gg/g/21284/gid-1-800-druidia/'
response = get(url)

soup = BeautifulSoup(response.text, 'html.parser')
type(soup)

members = soup.find_all('tr')
members.sort()

for users in members:
    name = users.td.a.strong.text
    names.append(name)


print(names)

您可以将 count 更改为 0,因为 python 索引从 0 开始,但最好还是直接从迭代器开始 users

这将完成您尝试从代码中获取的内容,即尝试获取可以从代码中推断出的名称

from requests import get

from bs4 import BeautifulSoup

# variables
count = 1

# lists to store data
names = []
gp = []
arenaRank = []

url = 'https://swgoh.gg/g/21284/gid-1-800-druidia/'
response = get(url)

soup = BeautifulSoup(response.content, 'html.parser')

for users in soup.findAll('strong'):
    if users.text.strip().encode("utf-8")!= '':
        names.append(users.text.strip().encode("utf-8"))



print(names)