如何将表单数据中的Post国家代码Url得到预期的WebData？

Question

我想在不登录的情况下从https://tradingeconomics.com/calendar下载特定国家/地区的日历。

首先，为了获得所需的国家数据，我 POST 表格中的国家代码为“https://sso.tradingeconomics.com/api/UserOptions”。

其次，我刷新了网页“https://tradingeconomics.com/calendar”。但是什么都没有更新。

这是我的 post 国家代码。在此示例脚本中，我尝试仅获取 Australia('list[0][Value]': 'aus') 日历，但它会返回所有默认国家/地区的日历。

import requests
import json

session = requests.session()
url = 'https://tradingeconomics.com/calendar'
page = session.get(url)
Logincookies = page.cookies

user_opt_url = "https://sso.tradingeconomics.com"
heads= {
    'authority': 'sso.tradingeconomics.com',
    'method': 'POST',
    'path': '/api/UserOptions',
    'scheme': 'https',
    'accept': 'application/json, text/javascript, */*; q=0.01',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8',
    'content-length': '418',
    'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'origin': 'https://tradingeconomics.com',
    'referer': 'https://tradingeconomics.com/calendar',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-site',
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
data = {
    'list[0][Host]': 'tradingeconomics.com',
    'list[0][Env]': '/calendar',
    'list[0][Name]': 'te-cal-countries',
    'list[0][Value]': 'aus',
    'list[1][Host]': 'tradingeconomics.com',
    'list[1][Env]': "/calendar",
    'list[1][Name]': 'te-cal-range',
    'list[1][Value]': '1',
    'list[2][Host]': 'tradingeconomics.com',
    'list[2][Env]': '/calendar',
    'list[2][Name]': 'te-cal-importance',
    'list[2][Value]': '1',
}

page = session.post(user_opt_url, headers = heads, data= json.dumps(data), cookies = Logincookies)

page = session.get(url)

然后我将网页放入 table，它会返回所有默认国家/地区的日历。

    from bs4 import BeautifulSoup
    doc = BeautifulSoup(page.text, 'lxml')
    ntr = doc.find_all('table')[1].select('tr[data-url^=""]')
    lst = []
    for n in ntr:
        if n.select('th'):
            lst = lst + [n.select('th')[0].get_text().strip(),  # date
                         None,  # time
                         None,  # lvl
                         None,  # country
                         None,  # event
                         None,  # actual
                         None,  # previous
                         None,  # revised
                         None,  # consensus
                         None,  # forecast
                         ]
        elif n.select('td'):
            lst = lst + [None,  # date
                         n.select('span[class^="calendar-date"]')[0].get_text().strip() if n.select(
                             'span[class^="calendar-date"]') else None,  # time
                         n.select('span[class^="calendar-date"]')[0]['class'][0].strip() if n.select(
                             'span[class^="calendar-date"]') else None,  # lvl
                         n.select('div[class^="flag"]')[0]['title'].strip() if n.select('div[class^="flag"]') else None,
                         # country
                         n.select('a[class="calendar-event"]')[0].get_text().strip() if n.select(
                             'a[class="calendar-event"]') else
                         n.select('span')[1].get_text().strip() if n.select('span') else None,  # event，这个筛选条件很弱
                         n.select('span[id="actual"]')[0].get_text().strip() if n.select('span[id="actual"]') else None,
                         # actual
                         n.select('span[id="previous"]')[0].get_text().strip() if n.select(
                             'span[id="previous"]') else None,  # previous
                         n.select('span[id="revised"]')[0].get_text().strip() if n.select(
                             'span[id="revised"]') else None,  # revised
                         n.select('span[id="consensus"]')[0].get_text().strip() if n.select(
                             'span[id="consensus"]') else None,  # consensus
                         n.select('span[id="forecast"]')[0].get_text().strip() if n.select(
                             'span[id="forecast"]') else None,  # forecast
                         ]
        else:
            print("error!!!")

我想当我post国家代码时，我需要留在会话中等待网页刷新。或者还有其他我错过的事情。感谢您的帮助。

Answer 1

几天后我想通了。我只需要在请求发送的 cookie 中添加几行。

    req_cookies = { "te-cal-range": "1", #,0:recent,1:today,2:tmr,3:this wk, 4:next wk, 5:next Mth, -1:ytdy,-2:last wk, -3:last mth
                   'te-cal-importance': '1',  # 
                   'te-cal-countries': 'aus,bra,can,chn,emu,eun,fra,deu,ind,idn,ita,jpn,mex,rus,sau,zaf,kor,\
                                      esp,tur,gbr,usa,sgp,twn,hkg,nzl,nor,mys,tha,vnm', 
                   'TECalendarOffset': '480',  # GMT+480mins

如何将表单数据中的Post国家代码Url得到预期的WebData？

How to Post Country Codes in Form Data to Url to Get Expected WebData?

form-data

http-post

python-requests