使用 download url + "href" 创建元组列表
create list of tuples with download url + "href"
我正在尝试制作一个元组列表,第一个元素是下载 URL,第二个是 URL 字符串中的文件名,代码如下:
import urllib
import requests
from bs4 import BeautifulSoup
import pandas as pd
import io
url = r"https://www.ers.usda.gov/data-products/livestock-meat-domestic-data"
my_bytes = urllib.request.urlopen(url)
my_bytes = my_bytes.read().decode("utf8")
parsed_html = BeautifulSoup(my_bytes, features = "lxml")
table_data = parsed_html.body.find('table', attrs = {'id':'data_table'})
download_url = "https://www.ers.usda.gov"
full_download_url = [tuple(download_url,i["href"]) for i in table_data.find_all('a')]
但我一直 TypeError: must be str, not list
,我不知道如何解决这个问题,请帮忙?谢谢!
您错误地访问了 download_url
数组索引。
Python 将您的代码解释为创建一个包含一个元素 [0]
的数组,而 i
是
0 例如,然后尝试访问元素 ["href"]
这是一个字符串,而不是一个有效的索引
如果您在访问索引之前指定 download_url
,它将按预期工作
full_download_url = [(download_url, download_url[i]["href"]) for i in table_data.find_all('a')]
这就是我需要的:
import urllib
import requests
from bs4 import BeautifulSoup
import pandas as pd
import io
url = r"https://www.ers.usda.gov/data-products/livestock-meat-domestic-data"
my_bytes = urllib.request.urlopen(url)
my_bytes = my_bytes.read().decode("utf8")
parsed_html = BeautifulSoup(my_bytes, features = "lxml")
table_data = parsed_html.body.find('table', attrs = {'id':'data_table'})
download_url = "https://www.ers.usda.gov"
def convertTuple(tup):
str = ''
for item in tup:
str = str + item
return str
full_download_url = [convertTuple(tuple(download_url + i["href"])) for i in table_data.find_all('a')]
感谢 Geeks for geeks 和所有努力提供帮助的人 :)
我正在尝试制作一个元组列表,第一个元素是下载 URL,第二个是 URL 字符串中的文件名,代码如下:
import urllib
import requests
from bs4 import BeautifulSoup
import pandas as pd
import io
url = r"https://www.ers.usda.gov/data-products/livestock-meat-domestic-data"
my_bytes = urllib.request.urlopen(url)
my_bytes = my_bytes.read().decode("utf8")
parsed_html = BeautifulSoup(my_bytes, features = "lxml")
table_data = parsed_html.body.find('table', attrs = {'id':'data_table'})
download_url = "https://www.ers.usda.gov"
full_download_url = [tuple(download_url,i["href"]) for i in table_data.find_all('a')]
但我一直 TypeError: must be str, not list
,我不知道如何解决这个问题,请帮忙?谢谢!
您错误地访问了 download_url
数组索引。
Python 将您的代码解释为创建一个包含一个元素 [0]
的数组,而 i
是
0 例如,然后尝试访问元素 ["href"]
这是一个字符串,而不是一个有效的索引
如果您在访问索引之前指定 download_url
,它将按预期工作
full_download_url = [(download_url, download_url[i]["href"]) for i in table_data.find_all('a')]
这就是我需要的:
import urllib
import requests
from bs4 import BeautifulSoup
import pandas as pd
import io
url = r"https://www.ers.usda.gov/data-products/livestock-meat-domestic-data"
my_bytes = urllib.request.urlopen(url)
my_bytes = my_bytes.read().decode("utf8")
parsed_html = BeautifulSoup(my_bytes, features = "lxml")
table_data = parsed_html.body.find('table', attrs = {'id':'data_table'})
download_url = "https://www.ers.usda.gov"
def convertTuple(tup):
str = ''
for item in tup:
str = str + item
return str
full_download_url = [convertTuple(tuple(download_url + i["href"])) for i in table_data.find_all('a')]
感谢 Geeks for geeks 和所有努力提供帮助的人 :)