为什么 python 请求 return html 文件而不是 excel？

Question

我想通过 python 从这个 link 下载 excel 文件 https://www.tfex.co.th/tfex/historicalTrading.html?locale=en_US&symbol=S50Z21&decorator=excel&series=&page=4&locale=en_US&locale=en_US&periodView=A

这是我的代码：

url = 'https://www.tfex.co.th/tfex/historicalTrading.html?locale=en_US&symbol=S50Z21&decorator=excel&series=&page=4&locale=en_US&periodView=A'

resp = requests.get(url)
with open('file.xls','wb') as f:
    f.write(resp.content)

但是 file.xls 是一个 html 文本文件。 file.xls 看起来像这样。1

我试过添加 headers

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
resp = requests.get(url, headers=headers)

但这并没有帮助。提前谢谢你。

Answer 1

编辑：找到了使用 pandas.

的方法

import pandas as pd

url = r'https://www.tfex.co.th/tfex/historicalTrading.html?locale=en_US&symbol=S50Z21&decorator=excel&series=&page=4&locale=en_US&periodView=A'

# read into HTML tables
tables = pd.read_html(url)
# merge HTML tables
merged = pd.concat(tables)
# Write tables to excel file
merged.to_excel("output.xlsx")

希望这对您有所帮助:)

忽略下面，这是编辑前的：

我知道这仍然存在问题，具体取决于您的下游应用程序。下面的代码似乎仍将其下载为 HTML 格式，但无论如何都可以在 excel 中打开此格式。

import requests
url = r'https://www.tfex.co.th/tfex/historicalTrading.html?locale=en_US&symbol=S50Z21&decorator=excel&series=&page=4&locale=en_US&periodView=A'

r = requests.get(url, allow_redirects=False)
excel_url = r.url
open('out.xls', 'wb').write(r.content)

当我在 excel 中打开它时，我收到警告，然后单击确定。

screenshot of file

为什么 python 请求 return html 文件而不是 excel？

Why python request return html file instead of excel?

html

python

excel

python-requests