根据 for 循环中的迭代次数向列表添加不同的值
Adding a different value to a list based on number of iteration in a for loop
我是 Python 和一般编程的新手,我在处理网站解析项目时遇到问题。
这是我设法编写的代码:
import requests
from bs4 import BeautifulSoup
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)
import json
#necessary lists
url_list = [
"https://warframe.market/items/melee_riven_mod_(veiled)",
"https://warframe.market/items/zaw_riven_mod_(veiled)"
]
item_list = []
items_name = []
combined_data = []
iteration = 0
#looping for every url found in url_list
for url in url_list:
#requesting data
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
#splitting the last part of the url which has the name of the item that I want to insert in the dataframe
name = url.split("/")[4]
items_name.append(name)
#Finding in the parsed HTML code where the JSON file starts ( it start from <script> n°2)
results = soup.find_all('script')[2].text.strip()
data = json.loads(results)
combined_data.append(data) #combining all the data into one list
#filtering only the users who sell the items and are either "ingame" or "online"
for payload in combined_data[iteration]["payload"]["orders"]:
if payload["order_type"] == "sell" and (payload["user"]["status"] == "online" or payload["user"]["status"] == "ingame"):
p = payload
item_list.append(p)
#adding the items names to the item list ???? PROBLEM ?????
item_list = [dict(item, **{'name':items_name[iteration]}) for item in item_list]
#trying to change the list from where the data gets taken from and the items name ????? PROBLEM ????
iteration += 1
#creating a dataframe with all the values
df = pd.DataFrame(item_list).sort_values(by=["platinum"])
我正在尝试做但找不到解决方案的方法是将 url 所指项目的名称添加到 item_list。
例如
index
platinum
quantity
...
items name (problematic column)
1
10
1
...
melee_riven_mod_(veiled)
2
11
1
...
melee_riven_mod_(veiled)
3
12
2
...
zaw_riven_mod_(veiled)
4
...
...
...
zaw_riven_mod_(veiled)
但项目名称列对所有行具有相同的名称,如下所示:
index
platinum
quantity
...
items name (problematic column)
1
10
1
...
melee_riven_mod_(veiled)
2
11
1
...
melee_riven_mod_(veiled)
3
12
2
...
melee_riven_mod_(veiled)
4
...
...
...
melee_riven_mod_(veiled)
所以我想问问我在for循环中做错了什么?它迭代 2 次,即 url_list
中的 url 的数量,但它不会更改项目的名称。
我没看到什么?
改变
if payload["order_type"] == "sell" and (payload["user"]["status"] == "online" or payload["user"]["status"] == "ingame"):
p = payload
item_list.append(p)
#adding the items names to the item list ???? PROBLEM ?????
item_list = [dict(item, **{'name':items_name[iteration]}) for item in item_list]
为此:
if payload["order_type"] == "sell" and (payload["user"]["status"] == "online" or payload["user"]["status"] == "ingame"):
payload['name'] = items_name[iteration]
item_list.append(payload)
请注意,您可以使用 enumerate
遍历 url_list
,而不是使用单独的变量 iteration
并递增它,它同时提供项目 和 每次迭代的索引:
for iteration, url in enumerate(url_list):
....
我是 Python 和一般编程的新手,我在处理网站解析项目时遇到问题。
这是我设法编写的代码:
import requests
from bs4 import BeautifulSoup
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)
import json
#necessary lists
url_list = [
"https://warframe.market/items/melee_riven_mod_(veiled)",
"https://warframe.market/items/zaw_riven_mod_(veiled)"
]
item_list = []
items_name = []
combined_data = []
iteration = 0
#looping for every url found in url_list
for url in url_list:
#requesting data
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
#splitting the last part of the url which has the name of the item that I want to insert in the dataframe
name = url.split("/")[4]
items_name.append(name)
#Finding in the parsed HTML code where the JSON file starts ( it start from <script> n°2)
results = soup.find_all('script')[2].text.strip()
data = json.loads(results)
combined_data.append(data) #combining all the data into one list
#filtering only the users who sell the items and are either "ingame" or "online"
for payload in combined_data[iteration]["payload"]["orders"]:
if payload["order_type"] == "sell" and (payload["user"]["status"] == "online" or payload["user"]["status"] == "ingame"):
p = payload
item_list.append(p)
#adding the items names to the item list ???? PROBLEM ?????
item_list = [dict(item, **{'name':items_name[iteration]}) for item in item_list]
#trying to change the list from where the data gets taken from and the items name ????? PROBLEM ????
iteration += 1
#creating a dataframe with all the values
df = pd.DataFrame(item_list).sort_values(by=["platinum"])
我正在尝试做但找不到解决方案的方法是将 url 所指项目的名称添加到 item_list。
例如
index | platinum | quantity | ... | items name (problematic column) |
---|---|---|---|---|
1 | 10 | 1 | ... | melee_riven_mod_(veiled) |
2 | 11 | 1 | ... | melee_riven_mod_(veiled) |
3 | 12 | 2 | ... | zaw_riven_mod_(veiled) |
4 | ... | ... | ... | zaw_riven_mod_(veiled) |
但项目名称列对所有行具有相同的名称,如下所示:
index | platinum | quantity | ... | items name (problematic column) |
---|---|---|---|---|
1 | 10 | 1 | ... | melee_riven_mod_(veiled) |
2 | 11 | 1 | ... | melee_riven_mod_(veiled) |
3 | 12 | 2 | ... | melee_riven_mod_(veiled) |
4 | ... | ... | ... | melee_riven_mod_(veiled) |
所以我想问问我在for循环中做错了什么?它迭代 2 次,即 url_list
中的 url 的数量,但它不会更改项目的名称。
我没看到什么?
改变
if payload["order_type"] == "sell" and (payload["user"]["status"] == "online" or payload["user"]["status"] == "ingame"):
p = payload
item_list.append(p)
#adding the items names to the item list ???? PROBLEM ?????
item_list = [dict(item, **{'name':items_name[iteration]}) for item in item_list]
为此:
if payload["order_type"] == "sell" and (payload["user"]["status"] == "online" or payload["user"]["status"] == "ingame"):
payload['name'] = items_name[iteration]
item_list.append(payload)
请注意,您可以使用 enumerate
遍历 url_list
,而不是使用单独的变量 iteration
并递增它,它同时提供项目 和 每次迭代的索引:
for iteration, url in enumerate(url_list):
....