无法为嵌套表获取正确的标记

Can't get the right tokens for the nested tables

我正在尝试以 json 文件的形式获取一些 table 的信息。问题是我似乎无法获得正确的 tables。看,有两个 json 文件,一个我可以在页面中获取,但是这个只包含非嵌套信息,这个我可以获取。 问题似乎是嵌套的。

我需要打印 tables:

我需要 json 文件中所有那些 table 的内容,但是,在这种情况下我似乎无法获得正确的标记。他们总是 return 登录页面,就好像会话已过期一样。

这是我用来抓取 tables 的代码:

    #does a json post for the last 100 elements
    url = "https://Awebsite.com/virtualaccount/entries"

    querystring = {"userId":"userid","moffset":"0"}

    payload = "sEcho=1&iColumns=4&sColumns=DateCtz%2CReason%2CDescription%2CFormattedAmount&iDisplayStart=0&iDisplayLength=100&mDataProp_0=DateCtz&mDataProp_1=Reason&mDataProp_2=Description&mDataProp_3=FormattedAmount&sSearch=&bRegex=false&sSearch_0=&bRegex_0=false&bSearchable_0=true&sSearch_1=&bRegex_1=false&bSearchable_1=false&sSearch_2=&bRegex_2=false&bSearchable_2=false&sSearch_3=&bRegex_3=false&bSearchable_3=false&sSortCol%5B0%5D=DateCtz&bSortDir%5B0%5D=false&iSortingCols=1&bSortable_0=true&bSortable_1=false&bSortable_2=false&bSortable_3=false"


    pgtos = session.post(url,params=querystring,data=payload,headers=headers)



    #gets the texts and converts it in python json
    json_data =  pgtos.text
    json1_data = json.loads(json_data)

    cooki = session.cookies.get_dict()

    coooookie = pgtos.cookies.get_dict()

    print 'cookies'
    print cooki
    print coooookie
    print cookie


    #here is the postman area im using... the problem is i can't seem to get the payload __requestverificationtoken right. If i use the postman one it works for a while before it expires.

    url = "https://Awebsite.com/virtualaccount/transactions"

    querystring = {"entryId":"<each one of the tables has a diferent entryId>"}

    payload = "__RequestVerificationToken=QMA2UREXdlRfwIagBWIjekZG4D1ykXrFXxtWnzWV3kc55529C26MyKbL4pHNbaiTjBBAvrrbIsZEroUBJPfc0zWam4nig9oOZxQOKJ2khnZlp2YqOgFgNAj8bYxMIiDGtc9sYBIZS6M_1o6jRAl8gQ2"

    #the postman headers, if i use the headers the postman passes me it works fine too, but in this case im trying to automate the area... so if i use the cookies i get from the site it simply won't work
    headers = {
        'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0",
        'Accept': "*/*",
        'Referer': "https://Awebsite.com/virtualaccount/view",
        'Content-Type': "application/x-www-form-urlencoded; charset=UTF-8",
        'X-Requested-With': "XMLHttpRequest",
        'Cookie': "_ga=GA1.0.000000000.0000000000; _gid=GA1.0.000000000.0000000000; __RequestVerificationToken="+cooki['__RequestVerificationToken']+"; e5ps_sid="+cooki['e5ps_sid']+"; .ASPXAUTH="+cooki['.ASPXAUTH']+"; _gat=1",          'Connection': "keep-alive",
        'Cache-Control': "no-cache",
        'Postman-Token': "..."
        }

    response = requests.request("POST", url, data=payload, headers=headers, params=querystring)

    print(response.text) #end of the postman test, if i use all the postman tokens and stuff it works for a time.



    #initiate the counter in 0
    i = 0
    #verify each of the items in the superior json file
    for index in json1_data['VirtualAccountEntries']:


        data = json1_data['VirtualAccountEntries'][i]['DateCtz']
        _date = datetime.date(int(data[6:10]),int(data[3:5]),int(data[0:2]))
        if current_day== _date or prvious_day == _data:

            url = "https://Awebsite.com/virtualaccount/transactions"
            querystring = {"entryId":json1_data['VirtualAccountEntries'][i]['Id']}

            payload = "__RequestVerificationToken=<the requestverification token i can't seem to get right>"

            payment = requests.request("POST", url, data=payload, headers=headers, params=querystring)

            print payment.text


            with open("C:\... +".json" , "w") as fp:
                json.dump(pagamentos.content, fp)

        i +=1

这里是 HTML 区域,所有嵌套 table 都打开,每次我打开一个嵌套 table 一个新的 json post请求被执行。所以我需要通过在代码中发出几个 json post 请求来获取所有这些请求,每次我需要更改请求的,以便从 table 中获得正确的请求。

    <div class="table-responsive">
        <div id="DataTables_Table_1_wrapper" class="dataTables_wrapper" role="grid"><div class="row dt-rt"><div class="col-sm-6"><div id="DataTables_Table_1_length" class="dataTables_length"><label><select size="1" name="DataTables_Table_1_length" aria-controls="DataTables_Table_1"><option value="10" selected="selected">10</option><option value="25">25</option><option value="50">50</option><option value="100">100</option></select> Registry</label></div></div><div class="col-sm-6"><div class="dataTables_filter" id="DataTables_Table_1_filter"><label>Filter: <input aria-controls="DataTables_Table_1" type="text"></label></div></div></div><table class="table table-responsive server-table dataTable" data-source="/virtualaccount/entries?userId=<user id>;moffset=0" data-source-property="VirtualAccountEntries" data-source-callback="bindActions" id="DataTables_Table_1">
            <thead>
                <tr role="row"><th data-sortable="true" data-prop="DateCtz" data-render="renderDate" class="sorting_asc" role="columnheader" tabindex="0" aria-controls="DataTables_Table_1" rowspan="1" colspan="1" style="width: 121px;" aria-label="
                        Data
                    : activate to sort column ascending">
                        Data
                    </th><th data-prop="Reason" class="sorting_disabled" role="columnheader" rowspan="1" colspan="1" style="width: 112px;" aria-label="
                        Type
                    ">
                        Type
                    </th><th data-prop="Description" class="sorting_disabled" role="columnheader" rowspan="1" colspan="1" style="width: 218px;" aria-label="
                        Description
                    ">
                        Description
                    </th><th data-prop="FormattedAmount" class="total sorting_disabled" role="columnheader" rowspan="1" colspan="1" style="width: 133px;" aria-label="
                        Value
                    ">
                        Value
                    </th></tr>
            </thead>
        <tbody role="alert" aria-live="polite" aria-relevant="all"><tr class="odd"><td class=" sorting_1">31/07/2018 21:00:00</td><td class="">Saldo</td><td class="">Saldo</td><td class="">-106,15</td></tr><tr class="even"><td class=" sorting_1">01/08/2018 21:00:00</td><td class="">Saldo</td><td class="">Saldo</td><td class="">-106,15</td></tr><tr class="odd" id="row-<sale id>"><td class=" sorting_1"><img id="loader-<sale id>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<sale id>" id="explosion-<sale id>" href="www.Awebsite.com/transactions?entryId=<sale id>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>08/08/2018 19:02:23</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">3,80</td></tr><tr class="subdata"><td>06/08/2018 17:34:43</td><td>Transaction</td><td>Debit transaction with authorization NUMBER.</td><td>$ 3,80</td></tr><tr class="even" id="row-<sale id>"><td class=" sorting_1"><img id="loader-<sale id>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<sale id>" id="explosion-<sale id>" href="www.Awebsite.com/transactions?entryId=<sale id>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>08/08/2018 19:02:23</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">43,74</td></tr><tr class="subdata"><td>06/08/2018 15:36:01</td><td>Transaction</td><td>Debit transaction with authorization NUMBER.</td><td>$ 43,74</td></tr><tr class="odd" id="row-<Sale ID>"><td class=" sorting_1"><img id="loader-<Sale ID>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<Sale ID>" id="explosion-<Sale ID>" href="www.Awebsite.com/transactions?entryId=<Sale ID>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>08/08/2018 19:02:23</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">71,07</td></tr><tr class="subdata"><td>06/08/2018 16:12:38</td><td>Transaction</td><td>Credit transaction with authorization NUMBER.</td><td>$ 59,98</td></tr><tr class="subdata"><td>06/08/2018 15:51:33</td><td>Transaction</td><td>Credit transaction with authorization 029530.</td><td>$ 11,09</td></tr><tr class="even" id="row-<another sale ID>"><td class=" sorting_1"><img id="loader-<another sale ID>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<another sale ID>" id="explosion-<another sale ID>" href="www.Awebsite.com/transactions?entryId=<another sale ID>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>08/08/2018 19:02:23</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">51,52</td></tr><tr class="subdata"><td>06/08/2018 16:02:19</td><td>Transaction</td><td>Credit transaction with authorization NUMBER.</td><td>$ 51,52</td></tr><tr class="odd" id="row-<yet another id for another sale>"><td class=" sorting_1"><img id="loader-<yet another id for another sale>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<yet another id for another sale>" id="explosion-<yet another id for another sale>" href="www.Awebsite.com/transactions?entryId=<yet another id for another sale>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>08/08/2018 19:02:23</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">75,11</td></tr><tr class="subdata"><td>06/08/2018 09:33:31</td><td>Transaction</td><td>Credit transaction with authorization NUMBER.</td><td>$ 75,11</td></tr><tr class="even" id="row-<Sale ID>"><td class=" sorting_1"><img id="loader-<Sale ID>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<Sale ID>" id="explosion-<Sale ID>" href="www.Awebsite.com/transactions?entryId=<Sale ID>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>08/08/2018 19:02:23</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">22,21</td></tr><tr class="subdata"><td>06/08/2018 14:31:36</td><td>Transaction</td><td>Debit transaction with authorization NUMBER.</td><td>$ 22,21</td></tr><tr class="odd" id="row-<sale id>"><td class=" sorting_1"><img id="loader-<sale id>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<sale id>" id="explosion-<sale id>" href="www.Awebsite.com/transactions?entryId=<sale id>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>08/08/2018 19:02:23</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">122,12</td></tr><tr class="subdata"><td>06/08/2018 14:39:01</td><td>Transaction</td><td>Credit transaction with authorization NUMBER.</td><td>$ 122,12</td></tr><tr class="even" id="row-<sale id>"><td class=" sorting_1"><img id="loader-<sale id>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<sale id>" id="explosion-<sale id>" href="www.Awebsite.com/transactions?entryId=<sale id>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>09/08/2018 16:22:56</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">48,51</td></tr><tr class="subdata"><td>07/08/2018 13:35:55</td><td>Transaction</td><td>Debit transaction with authorization NUMBER.</td><td>$ 48,51</td></tr></tbody></table><div class="row dt-rb"><div class="col-sm-4"></div><div class="col-sm-8"><div class="dataTables_paginate paging_bootstrap"><ul class="pagination"><li class="prev disabled"><a href="#">← Previous</a></li><li class="active"><a href="#">1</a></li><li><a href="#">2</a></li><li><a href="#">3</a></li><li><a href="#">4</a></li><li><a href="#">5</a></li><li class="next"><a href="#">Next → </a></li></ul></div></div></div></div>
    </div>
</div>

每次我尝试使流程自动化,更改令牌或任何唯一 returns 是登录页面。但是,如果我使用 postman 应用程序提供的令牌和 cookie,我可以很好地下载信息。

编辑 1:

我能够在站点发出的请求中找到 XHR json 请求,但是当我无法为其获取正确的令牌和 cookie 时。 代码需要 return 一个 Json 文件,其中包含我需要的一堆数据,但是没有 postman 应用程序,我只能在没有 table 的情况下恢复 html我真的需要。

my requests

令牌和 cookie 与我可以通过 session.cookies.get_dict() 或在当前会话中以任何方式获得的令牌和 cookie 不同

有人可以帮我吗?

谢谢!!

我设法做对了。 在 HTML 代码的深处有一个带有 __RequestVerificationToken 值的脚本,所以,当我发现它时,只需使用 BeautifulSoup 来解析脚本并找到正确的文本使用令牌,然后将令牌传递给请求 headers,这样 session 就不会过期。