使用 download url + "href" 创建元组列表

create list of tuples with download url + "href"

我正在尝试制作一个元组列表,第一个元素是下载 URL,第二个是 URL 字符串中的文件名,代码如下:

import urllib
import requests
from bs4 import BeautifulSoup
import pandas as pd
import io
url = r"https://www.ers.usda.gov/data-products/livestock-meat-domestic-data"
my_bytes = urllib.request.urlopen(url)
my_bytes = my_bytes.read().decode("utf8")
parsed_html = BeautifulSoup(my_bytes, features = "lxml")
table_data = parsed_html.body.find('table', attrs = {'id':'data_table'})
download_url = "https://www.ers.usda.gov"
full_download_url = [tuple(download_url,i["href"]) for i in table_data.find_all('a')]

但我一直 TypeError: must be str, not list ,我不知道如何解决这个问题,请帮忙?谢谢!

您错误地访问了 download_url 数组索引。

Python 将您的代码解释为创建一个包含一个元素 [0] 的数组,而 i 是 0 例如,然后尝试访问元素 ["href"] 这是一个字符串,而不是一个有效的索引

如果您在访问索引之前指定 download_url,它将按预期工作

full_download_url = [(download_url, download_url[i]["href"]) for i in table_data.find_all('a')]

这就是我需要的:

import urllib
import requests
from bs4 import BeautifulSoup
import pandas as pd
import io
url = r"https://www.ers.usda.gov/data-products/livestock-meat-domestic-data"
my_bytes = urllib.request.urlopen(url)
my_bytes = my_bytes.read().decode("utf8")
parsed_html = BeautifulSoup(my_bytes, features = "lxml")
table_data = parsed_html.body.find('table', attrs = {'id':'data_table'})
download_url = "https://www.ers.usda.gov"
def convertTuple(tup):
    str = ''
    for item in tup:
        str = str + item
    return str
full_download_url = [convertTuple(tuple(download_url + i["href"])) for i in table_data.find_all('a')]

感谢 Geeks for geeks 和所有努力提供帮助的人 :)