php POST stream_context_create 和 file_get_contents
php POST stream_context_create and file_get_contents
我有一个以前有效的脚本,但显然它下载文件的网站不知何故改变了格式。我已经将 POST 请求内容和 header 更改为我认为应该的内容,但它并没有像我预期的那样提取文件。这是我现在拥有的用于该功能的脚本片段:
$url='http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx?';
$header= "Host: mansfield.tea.state.tx.us\r\n";
$header.= "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0\r\n";
$header.= "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n";
$header.= "Accept-Language: en-GB,en;q=0.5\r\n";
$header.= "Accept-Encoding: gzip, deflate\r\n";
$header.= "Referer: http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx\r\n";
$header.= "Content-Type: application/x-www-form-urlencoded\r\n";
$header.= "Content-Length: 14067\r\n";
$header.= "Cookie: ga=GA1.3.400055834.1504257175; ASP.NET_SessionId=cj020m45uwj5hpmwaclhvmuk\r\n";
$header.= "Connection: keep-alive\r\n";
$header.= "Upgrade-Insecure-Requests: 1\r\n";
$postdata= array(
"__VIEWSTATE" => "/wEPDwULLTE3NDczMDI1MTIPZBYCAgEPZBYEAgMPFCsABWRkZBQrAAcQFg4eBkl0ZW1JRAURX2N0bDAtbWVudUl0ZW0wMDAeCEl0ZW1UZXh0BVI8YSBpZD0iaHlwZXJsaW5rMSIgaHJlZj0iL1RFQS5Bc2tURUQuV2ViL0Zvcm1zL0hvbWUuYXNweCIgY2xhc3M9Im1lbnVOYXYiPkhvbWU8L2E+HgdJdGVtVVJMBRF+L0Zvcm1zL0hvbWUuYXNweB4PTWVudUl0ZW1Ub29sVGlwBQRIb21lHhBNZW51SXRlbUNzc0NsYXNzBRJob3Jpem9udGFsTWVudUl0ZW0eFUl0ZW1Nb3VzZU92ZXJDc3NDbGFzcwUWaG9yaXpvbnRhbE1lbnVTZWxlY3RlZB4LSXRlbVNlY3VyZWRoZGQQFgwfAAURX2N0bDAtbWVudUl0ZW0wMDEfAQVNPGEgaHJlZj0iL1RFQS5Bc2tURUQuV2ViL0Zvcm1zL1NlYXJjaE1h…m9udGFsTWVudUl0ZW0fBQUWaG9yaXpvbnRhbE1lbnVTZWxlY3RlZB8GaGRkFCsAAQUHdGVhdGVtcGQCDQ8QZA8WCWYCAQICAgMCBAIFAgYCBwIIFgkQBQ1TY2hvb2wgTnVtYmVyBQ1TY2hvb2wgTnVtYmVyZxAFC1NjaG9vbCBOYW1lBQtTY2hvb2wgTmFtZWcQBQ1EaXN0cmljdCBOYW1lBQ1EaXN0cmljdCBOYW1lZxAFC0NvdW50eSBOYW1lBQtDb3VudHkgTmFtZWcQBQZSZWdpb24FBlJlZ2lvbmcQBQtTY2hvb2wgQ2l0eQULU2Nob29sIENpdHlnEAUPU2Nob29sIFppcCBDb2RlBQ9TY2hvb2wgWmlwIENvZGVnEAUNRGlzdHJpY3QgQ2l0eQUNRGlzdHJpY3QgQ2l0eWcQBRFEaXN0cmljdCBaaXAgQ29kZQURRGlzdHJpY3QgWmlwIENvZGVnZGRke3qSSaoJbwyFyN/A1p+yD+sPADY=",
"__VIEWSTATEGENERATOR" => "44F2C40C",
"btnDownloadFile" => "Download+File",
"ddlSortOrder" => "School+Number"
);
$opts = array(
'http' => array(
'method' => 'POST',
'content' => http_build_query($postdata),
'header' => $header
)
);
$context = stream_context_create($opts);
$file = file_get_contents($url, false, $context);
它应该return 一个包含德州学校列表和学校数据的文件,但实际上没有。
我从网络开发人员控制台获取了 header ($header) 部分和内容 ($postdata) 部分的信息。它从中提取数据的网站是 http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx.
关于我如何修复这些 header 和内容以在 php 中通过 CLI 下载文件的任何想法?
谢谢
ViewState 每次都会更改,因此使用 simple_html_dom 废弃视图状态并将其传递给
这是工作代码
<?php
include_once('simple_html_dom.php');
$url="http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx";
$html=file_get_html($url);
$viewstate = $html->find('input',0)->value;
$url='http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx?';
$header= "Host: mansfield.tea.state.tx.us\r\n";
$header.= "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0\r\n";
$header.= "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n";
$header.= "Accept-Language: en-GB,en;q=0.5\r\n";
$header.= "Accept-Encoding: gzip, deflate\r\n";
$header.= "Referer: http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx\r\n";
$header.= "Content-Type: application/x-www-form-urlencoded\r\n";
$header.= "Content-Length: 14067\r\n";
$header.= "Cookie: ga=GA1.3.400055834.1504257175; ASP.NET_SessionId=cj020m45uwj5hpmwaclhvmuk\r\n";
$header.= "Connection: keep-alive\r\n";
$header.= "Upgrade-Insecure-Requests: 1\r\n";
$postdata= array(
"__VIEWSTATE" => $viewstate,
"__VIEWSTATEGENERATOR" => "44F2C40C",
"btnDownloadFile" => "Download+File",
"ddlSortOrder" => "School+Number"
);
$opts = array(
'http' => array(
'method' => 'POST',
'content' => http_build_query($postdata),
'header' => $header
)
);
$context = stream_context_create($opts);
$file = file_get_contents($url, false, $context);
echo $file;
?>
我有一个以前有效的脚本,但显然它下载文件的网站不知何故改变了格式。我已经将 POST 请求内容和 header 更改为我认为应该的内容,但它并没有像我预期的那样提取文件。这是我现在拥有的用于该功能的脚本片段:
$url='http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx?';
$header= "Host: mansfield.tea.state.tx.us\r\n";
$header.= "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0\r\n";
$header.= "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n";
$header.= "Accept-Language: en-GB,en;q=0.5\r\n";
$header.= "Accept-Encoding: gzip, deflate\r\n";
$header.= "Referer: http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx\r\n";
$header.= "Content-Type: application/x-www-form-urlencoded\r\n";
$header.= "Content-Length: 14067\r\n";
$header.= "Cookie: ga=GA1.3.400055834.1504257175; ASP.NET_SessionId=cj020m45uwj5hpmwaclhvmuk\r\n";
$header.= "Connection: keep-alive\r\n";
$header.= "Upgrade-Insecure-Requests: 1\r\n";
$postdata= array(
"__VIEWSTATE" => "/wEPDwULLTE3NDczMDI1MTIPZBYCAgEPZBYEAgMPFCsABWRkZBQrAAcQFg4eBkl0ZW1JRAURX2N0bDAtbWVudUl0ZW0wMDAeCEl0ZW1UZXh0BVI8YSBpZD0iaHlwZXJsaW5rMSIgaHJlZj0iL1RFQS5Bc2tURUQuV2ViL0Zvcm1zL0hvbWUuYXNweCIgY2xhc3M9Im1lbnVOYXYiPkhvbWU8L2E+HgdJdGVtVVJMBRF+L0Zvcm1zL0hvbWUuYXNweB4PTWVudUl0ZW1Ub29sVGlwBQRIb21lHhBNZW51SXRlbUNzc0NsYXNzBRJob3Jpem9udGFsTWVudUl0ZW0eFUl0ZW1Nb3VzZU92ZXJDc3NDbGFzcwUWaG9yaXpvbnRhbE1lbnVTZWxlY3RlZB4LSXRlbVNlY3VyZWRoZGQQFgwfAAURX2N0bDAtbWVudUl0ZW0wMDEfAQVNPGEgaHJlZj0iL1RFQS5Bc2tURUQuV2ViL0Zvcm1zL1NlYXJjaE1h…m9udGFsTWVudUl0ZW0fBQUWaG9yaXpvbnRhbE1lbnVTZWxlY3RlZB8GaGRkFCsAAQUHdGVhdGVtcGQCDQ8QZA8WCWYCAQICAgMCBAIFAgYCBwIIFgkQBQ1TY2hvb2wgTnVtYmVyBQ1TY2hvb2wgTnVtYmVyZxAFC1NjaG9vbCBOYW1lBQtTY2hvb2wgTmFtZWcQBQ1EaXN0cmljdCBOYW1lBQ1EaXN0cmljdCBOYW1lZxAFC0NvdW50eSBOYW1lBQtDb3VudHkgTmFtZWcQBQZSZWdpb24FBlJlZ2lvbmcQBQtTY2hvb2wgQ2l0eQULU2Nob29sIENpdHlnEAUPU2Nob29sIFppcCBDb2RlBQ9TY2hvb2wgWmlwIENvZGVnEAUNRGlzdHJpY3QgQ2l0eQUNRGlzdHJpY3QgQ2l0eWcQBRFEaXN0cmljdCBaaXAgQ29kZQURRGlzdHJpY3QgWmlwIENvZGVnZGRke3qSSaoJbwyFyN/A1p+yD+sPADY=",
"__VIEWSTATEGENERATOR" => "44F2C40C",
"btnDownloadFile" => "Download+File",
"ddlSortOrder" => "School+Number"
);
$opts = array(
'http' => array(
'method' => 'POST',
'content' => http_build_query($postdata),
'header' => $header
)
);
$context = stream_context_create($opts);
$file = file_get_contents($url, false, $context);
它应该return 一个包含德州学校列表和学校数据的文件,但实际上没有。
我从网络开发人员控制台获取了 header ($header) 部分和内容 ($postdata) 部分的信息。它从中提取数据的网站是 http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx.
关于我如何修复这些 header 和内容以在 php 中通过 CLI 下载文件的任何想法?
谢谢
ViewState 每次都会更改,因此使用 simple_html_dom 废弃视图状态并将其传递给
这是工作代码
<?php
include_once('simple_html_dom.php');
$url="http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx";
$html=file_get_html($url);
$viewstate = $html->find('input',0)->value;
$url='http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx?';
$header= "Host: mansfield.tea.state.tx.us\r\n";
$header.= "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0\r\n";
$header.= "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n";
$header.= "Accept-Language: en-GB,en;q=0.5\r\n";
$header.= "Accept-Encoding: gzip, deflate\r\n";
$header.= "Referer: http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx\r\n";
$header.= "Content-Type: application/x-www-form-urlencoded\r\n";
$header.= "Content-Length: 14067\r\n";
$header.= "Cookie: ga=GA1.3.400055834.1504257175; ASP.NET_SessionId=cj020m45uwj5hpmwaclhvmuk\r\n";
$header.= "Connection: keep-alive\r\n";
$header.= "Upgrade-Insecure-Requests: 1\r\n";
$postdata= array(
"__VIEWSTATE" => $viewstate,
"__VIEWSTATEGENERATOR" => "44F2C40C",
"btnDownloadFile" => "Download+File",
"ddlSortOrder" => "School+Number"
);
$opts = array(
'http' => array(
'method' => 'POST',
'content' => http_build_query($postdata),
'header' => $header
)
);
$context = stream_context_create($opts);
$file = file_get_contents($url, false, $context);
echo $file;
?>