在 for 循环和 while 循环中正确索引
Indexing correctly within a for-loop and while-loop
我有一组网页的品牌编号 url。我将网页 url 转换为 f 字符串,并在应有的位置应用品牌编号。每个页面都有一个唯一的 ID 来加载下一页。我正在尝试提取下一页,同时匹配 ID 所属的品牌编号。
下面是一些示例代码:
import requests
import pandas as pd
from bs4 import BeautifulSoup
brands = [989,1344,474,1237,886,1,328,2188]
testid = {}
for b in brands:
url = f'https://webapi.depop.com/api/v2/search/products/?brands={b}&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance'
payload={}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
test= pd.read_json(StringIO(response.text), lines=True)
for m in test['meta'].items():
if m[1]['hasMore'] == True:
testid[str(b)]= [m[1]['cursor']]
else:
continue
for br in testid.keys():
while True:
html = f'https://webapi.depop.com/api/v2/search/products/?brands={br}&cursor={testid[str(br)][-1]}&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance'
r = requests.request("GET",html, headers=headers, data=payload)
read_id = pd.read_json(StringIO(r.text), lines=True)
for m in read_id['meta'].items():
try:
testid[str(br)].append(m[1]['cursor'])
except:
continue
这是它产生的输出:
{'989': ['MnwyNHwxNjQwMDMwODcw']}
但是,它会替换品牌编号中原来的值,只保留最后一个收集的值。它应该留下一个列表并产生如下内容:
{'989': ['MnwyNHwxNjQwMDI4Mzk1', ...],
'1344': ['MnwyNHwxNjQwMDI4Mzk2', ...],
'474': ['MnwyNHwxNjQwMDI4Mzk3', ...],
'1237': ['MnwyNHwxNjQwMDI4Mzk3', ...],
'886': ['MnwyNHwxNjQwMDI4Mzk4', ...],
'1': ['MnwyNHwxNjQwMDI4Mzk4', ...],
'328': ['MnwyNHwxNjQwMDI4Mzk5', ...],
其中三点 ...
表示从具有该品牌编号的页面收集的额外 ID 值。我怎样才能得到这样的输出?
在将 testid
列表设置为 collections.defaultdict(list)
之后,其余部分将以相当直接的方式消失..
注意:我只会获取任何产品的前 3 个光标,但您可以随意获取它们。
import collections
import requests
brands = [989,1344,474,1237,886,1,328,2188]
testid = collections.defaultdict(list)
for b in brands:
headers = {}
payload={}
url = f"https://webapi.depop.com/api/v2/search/products/?brands={b}&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance"
response = requests.request("GET", url, headers=headers, data=payload)
data = response.json()
i = 0 # short circuit
while data.get("meta", {}).get("hasMore") and i < 3:
cursor = data.get("meta", {}).get("cursor")
testid[str(b)].append(cursor)
response = requests.request("GET", f"{url}&cursor={cursor}", headers=headers, data=payload)
data = response.json()
i += 1
for key, value in testid.items():
print(key, value)
这给了我们:
989 ['MnwyNHwxNjQwMDMzMjM0']
1344 ['MnwyNHwxNjQwMDMzMjM1', 'M3w0OHwxNjQwMDMzMjM1', 'NHw3MnwxNjQwMDMzMjM1']
474 ['MnwyNHwxNjQwMDMzMjM3', 'M3w0OHwxNjQwMDMzMjM3', 'NHw3MnwxNjQwMDMzMjM3']
1237 ['MnwyNHwxNjQwMDMzMjM5', 'M3w0OHwxNjQwMDMzMjM5', 'NHw3MnwxNjQwMDMzMjM5']
886 ['MnwyNHwxNjQwMDMzMjQz', 'M3w0OHwxNjQwMDMzMjQz', 'NHw3MnwxNjQwMDMzMjQz']
1 ['MnwyNHwxNjQwMDMzMjQ4', 'M3w0OHwxNjQwMDMzMjQ4', 'NHw3MnwxNjQwMDMzMjQ4']
328 ['MnwyNHwxNjQwMDMzMjUz', 'M3w0OHwxNjQwMDMzMjUz', 'NHw3MnwxNjQwMDMzMjUz']
等一下....发生了什么:
data.get("meta", {}).get("hasMore")
问得好,我之前应该解释一下。
因此,有可能 data.meta
未定义,如果是,则以下操作将失败;
data["meta"].get("hasMore")
如愿
data.get("meta").get("hasMore")
所以我们做了什么:
data.get("meta", {}).get("hasMore")
是使用 get()
的第二个参数来提供默认值。在这种情况下,它只是一个空 dict
但这足以让我们安全地将后续 .get("hasMore")
链接到
我有一组网页的品牌编号 url。我将网页 url 转换为 f 字符串,并在应有的位置应用品牌编号。每个页面都有一个唯一的 ID 来加载下一页。我正在尝试提取下一页,同时匹配 ID 所属的品牌编号。
下面是一些示例代码:
import requests
import pandas as pd
from bs4 import BeautifulSoup
brands = [989,1344,474,1237,886,1,328,2188]
testid = {}
for b in brands:
url = f'https://webapi.depop.com/api/v2/search/products/?brands={b}&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance'
payload={}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
test= pd.read_json(StringIO(response.text), lines=True)
for m in test['meta'].items():
if m[1]['hasMore'] == True:
testid[str(b)]= [m[1]['cursor']]
else:
continue
for br in testid.keys():
while True:
html = f'https://webapi.depop.com/api/v2/search/products/?brands={br}&cursor={testid[str(br)][-1]}&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance'
r = requests.request("GET",html, headers=headers, data=payload)
read_id = pd.read_json(StringIO(r.text), lines=True)
for m in read_id['meta'].items():
try:
testid[str(br)].append(m[1]['cursor'])
except:
continue
这是它产生的输出:
{'989': ['MnwyNHwxNjQwMDMwODcw']}
但是,它会替换品牌编号中原来的值,只保留最后一个收集的值。它应该留下一个列表并产生如下内容:
{'989': ['MnwyNHwxNjQwMDI4Mzk1', ...],
'1344': ['MnwyNHwxNjQwMDI4Mzk2', ...],
'474': ['MnwyNHwxNjQwMDI4Mzk3', ...],
'1237': ['MnwyNHwxNjQwMDI4Mzk3', ...],
'886': ['MnwyNHwxNjQwMDI4Mzk4', ...],
'1': ['MnwyNHwxNjQwMDI4Mzk4', ...],
'328': ['MnwyNHwxNjQwMDI4Mzk5', ...],
其中三点 ...
表示从具有该品牌编号的页面收集的额外 ID 值。我怎样才能得到这样的输出?
在将 testid
列表设置为 collections.defaultdict(list)
之后,其余部分将以相当直接的方式消失..
注意:我只会获取任何产品的前 3 个光标,但您可以随意获取它们。
import collections
import requests
brands = [989,1344,474,1237,886,1,328,2188]
testid = collections.defaultdict(list)
for b in brands:
headers = {}
payload={}
url = f"https://webapi.depop.com/api/v2/search/products/?brands={b}&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance"
response = requests.request("GET", url, headers=headers, data=payload)
data = response.json()
i = 0 # short circuit
while data.get("meta", {}).get("hasMore") and i < 3:
cursor = data.get("meta", {}).get("cursor")
testid[str(b)].append(cursor)
response = requests.request("GET", f"{url}&cursor={cursor}", headers=headers, data=payload)
data = response.json()
i += 1
for key, value in testid.items():
print(key, value)
这给了我们:
989 ['MnwyNHwxNjQwMDMzMjM0']
1344 ['MnwyNHwxNjQwMDMzMjM1', 'M3w0OHwxNjQwMDMzMjM1', 'NHw3MnwxNjQwMDMzMjM1']
474 ['MnwyNHwxNjQwMDMzMjM3', 'M3w0OHwxNjQwMDMzMjM3', 'NHw3MnwxNjQwMDMzMjM3']
1237 ['MnwyNHwxNjQwMDMzMjM5', 'M3w0OHwxNjQwMDMzMjM5', 'NHw3MnwxNjQwMDMzMjM5']
886 ['MnwyNHwxNjQwMDMzMjQz', 'M3w0OHwxNjQwMDMzMjQz', 'NHw3MnwxNjQwMDMzMjQz']
1 ['MnwyNHwxNjQwMDMzMjQ4', 'M3w0OHwxNjQwMDMzMjQ4', 'NHw3MnwxNjQwMDMzMjQ4']
328 ['MnwyNHwxNjQwMDMzMjUz', 'M3w0OHwxNjQwMDMzMjUz', 'NHw3MnwxNjQwMDMzMjUz']
等一下....发生了什么:
data.get("meta", {}).get("hasMore")
问得好,我之前应该解释一下。
因此,有可能 data.meta
未定义,如果是,则以下操作将失败;
data["meta"].get("hasMore")
如愿
data.get("meta").get("hasMore")
所以我们做了什么:
data.get("meta", {}).get("hasMore")
是使用 get()
的第二个参数来提供默认值。在这种情况下,它只是一个空 dict
但这足以让我们安全地将后续 .get("hasMore")
链接到