如何使用 python 请求（获取）和读取 xml 文件？

Question

我尝试在 Treasury Direct using Python. In the past I've used urllib, or requests libraries to serve this purpose and it's worked fine. This time however, I continue to get the 406 status error 上请求 RSS 提要，据我所知，这是该页面告诉我它不接受请求中我的 header 详细信息的方式。我试过修改它但无济于事。
这就是我尝试过的方式

import requests
url = 'https://www.treasurydirect.gov/TA_WS/securities/announced/rss'
user_agent = {'User-agent': 'Mozilla/5.0'}
response  = requests.get(url, headers = user_agent)
print response.text

环境： Python 2.7 和 3.4。我也尝试通过 curl 访问，但出现了同样的错误。

我认为这是特定于页面的，但无法弄清楚如何适当地构建阅读此页面的请求。

我在页面上发现了一个 API，我可以在 json 中读取相同的数据，所以这个问题现在对我来说更像是一个好奇心，而不是一个真正的问题。

任何答案将不胜感激！

Header详情

{'surrogate-control': 'content="ESI/1.0",no-store', 'content-language': 'en-US', 'x-content-type-options': 'nosniff', 'x-powered-by': 'Servlet/3.0', 'transfer-encoding': 'chunked', 'set-cookie': 'BIGipServerpl_www.treasurydirect.gov_443=3221581322.47873.0000; path=/; Httponly; Secure, TS01598982=016b0e6f4634928e3e7e689fa438848df043a46cb4aa96f235b0190439b1d07550484963354d8ef442c9a3eb647175602535b52f3823e209341b1cba0236e4845955f0cdcf; Path=/', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'keep-alive': 'timeout=10, max=100', 'connection': 'Keep-Alive', 'cache-control': 'no-store', 'date': 'Sun, 23 Apr 2017 04:13:00 GMT', 'x-frame-options': 'SAMEORIGIN', '$wsep': '', 'content-type': 'text/html;charset=ISO-8859-1'}

Answer 1

您需要将 accept 添加到 headers 请求：

import requests

url = 'https://www.treasurydirect.gov/TA_WS/securities/announced/rss'
headers = {'accept': 'application/xml;q=0.9, */*;q=0.8'}
response = requests.get(url, headers=headers)

print response.text

如何使用 python 请求（获取）和读取 xml 文件？

How can I request (get) and read an xml file using python?

python

curl

urllib

http-headers

python-requests