PHP Goutte / CURL - 完整的 ASPX 表格
PHP Goutte / CURL - Complete ASPX Form
我正在尝试从这里获取数据:https://wyobiz.wy.gov/Business/FilingSearch.aspx
我正在尝试检查公司名称是否免费。
但是这个网站是asp.net网页形式,整个网站就是一个大表格。
而且我不知道如何从这个表格中获取数据。
我认为问题在于:
'__VIEWSTATE' => '',
'__VIEWSTATEGENERATOR' => '9E6EC73D',
'__EVENTVALIDATION' => '',
是否可以在PHP中发送请求并获取返回数据?
因为,此代码返回
The current node list is empty.
谢谢。
我的代码:
$crawler = $client->request('GET', 'https://wyobiz.wy.gov/Business/FilingSearch.aspx');
$form = $crawler->selectButton('Search')->form();
$formValues = $form->getValues();
$crawler = $client->submit($form, array(
'__VIEWSTATE' => $formValues['__VIEWSTATE'],
'__VIEWSTATEGENERATOR' => $formValues['__VIEWSTATEGENERATOR'],
'__EVENTVALIDATION' => $formValues['__EVENTVALIDATION'],
'ctl00$MainContent$myScriptManager' => 'MainContent_myScriptManager',
'ctl00$MainContent$txtFilingName' => 'Google',
'ctl00$MainContent$searchOpt' => 'chkSearchStartWith',
'ctl00$MainContent$txtFilingID' => null,
));
$crawler->filter('body')->each(function ($node) {
print $node->text() . "\n";
});
结论:Goutte SUCKS Goutte 的支持 SUCKS!
嗯,我不熟悉 goutte,但是使用这个包 w3zone/crawler 我做了一个简单的例子来废弃那个 link:
使用以下方式安装:
composer require w3zone/Crawler
然后将其用于您的情况,如下所示:
require_once __DIR__ . '/vendor/autoload.php';
use w3zone\Crawler\{Crawler, Services\phpCurl};
$crawler = new Crawler(new phpCurl);
$link = 'https://wyobiz.wy.gov/Business/FilingSearch.aspx';
$homePage = $crawler->get($link)->run();
preg_match('#<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="(.*?)"\s*/>#', $homePage['body'], $viewState);
preg_match('#<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="(.*?)"\s*/>#', $homePage['body'], $viewGen);
preg_match('#<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="(.*?)"\s*/>#', $homePage['body' ], $eventVal);
$postData = array(
'__VIEWSTATE' => $viewState[1],
'__LASTFOCUS' => '',
'__EVENTTARGET' => '',
'__EVENTARGUMENT' => '',
'__VIEWSTATEGENERATOR' => $viewGen[1],
'__EVENTVALIDATION' => $eventVal[1],
'ctl00$MainContent$myScriptManager' => 'MainContent_myScriptManager',
'ctl00$MainContent$txtFilingName' => 'test',
'ctl00$MainContent$searchOpt' => 'chkSearchStartWith',
'ctl00$MainContent$txtFilingID' => '',
'ctl00$MainContent$cmdSearch' => 'Search',
'__ASYNCPOST' => 'true',
'ctl00$MainContent$myScriptManager' => 'ctl00$MainContent$UpdatePanel1|ctl00$MainContent$cmdSearch',
);
$response = $crawler->post(['url' => $link, 'data' => $postData])->dumpHeaders()->run();
echo "<textarea style='width: 90%; height: 200px;'>".$response['body']."</textarea>";
我的问题是 ASP 异步响应不是 HTML - 它是包含 HTML 的文本:
<html>
1|#||4|6079|updatePanel|ctl00_MainContentPlaceHolder_ucLicenseLookup_UpdtPanelGridLookup|
<div class="modal-window-lookup-results fade bs-example-modal-lg in">
<div class="modal-header">
[...]
</html>
因此,当 goutte 将其提供给 browser-kit 时,它会崩溃。古特不烂-
你不能给它喂非HTML垃圾。
为了快速解决这个问题,我刚刚做了:
$crawler = $client->request('POST', $url, $params);
// this is a broken crawler because response is not html!
$html = $client->getResponse()->getContent();
$html = substr($html, strpos($html, "<div"));
$html = substr($html, 0, strpos($html, "|hiddenField|")-3);
$html = "<!DOCTYPE html><html>$html</html>";
$crawler = new \Symfony\Component\DomCrawler\Crawler($html);
print $crawler->html();
我正在尝试从这里获取数据:https://wyobiz.wy.gov/Business/FilingSearch.aspx
我正在尝试检查公司名称是否免费。 但是这个网站是asp.net网页形式,整个网站就是一个大表格。 而且我不知道如何从这个表格中获取数据。
我认为问题在于:
'__VIEWSTATE' => '',
'__VIEWSTATEGENERATOR' => '9E6EC73D',
'__EVENTVALIDATION' => '',
是否可以在PHP中发送请求并获取返回数据? 因为,此代码返回
The current node list is empty.
谢谢。
我的代码:
$crawler = $client->request('GET', 'https://wyobiz.wy.gov/Business/FilingSearch.aspx');
$form = $crawler->selectButton('Search')->form();
$formValues = $form->getValues();
$crawler = $client->submit($form, array(
'__VIEWSTATE' => $formValues['__VIEWSTATE'],
'__VIEWSTATEGENERATOR' => $formValues['__VIEWSTATEGENERATOR'],
'__EVENTVALIDATION' => $formValues['__EVENTVALIDATION'],
'ctl00$MainContent$myScriptManager' => 'MainContent_myScriptManager',
'ctl00$MainContent$txtFilingName' => 'Google',
'ctl00$MainContent$searchOpt' => 'chkSearchStartWith',
'ctl00$MainContent$txtFilingID' => null,
));
$crawler->filter('body')->each(function ($node) {
print $node->text() . "\n";
});
结论:Goutte SUCKS Goutte 的支持 SUCKS!
嗯,我不熟悉 goutte,但是使用这个包 w3zone/crawler 我做了一个简单的例子来废弃那个 link:
使用以下方式安装:
composer require w3zone/Crawler
然后将其用于您的情况,如下所示:
require_once __DIR__ . '/vendor/autoload.php';
use w3zone\Crawler\{Crawler, Services\phpCurl};
$crawler = new Crawler(new phpCurl);
$link = 'https://wyobiz.wy.gov/Business/FilingSearch.aspx';
$homePage = $crawler->get($link)->run();
preg_match('#<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="(.*?)"\s*/>#', $homePage['body'], $viewState);
preg_match('#<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="(.*?)"\s*/>#', $homePage['body'], $viewGen);
preg_match('#<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="(.*?)"\s*/>#', $homePage['body' ], $eventVal);
$postData = array(
'__VIEWSTATE' => $viewState[1],
'__LASTFOCUS' => '',
'__EVENTTARGET' => '',
'__EVENTARGUMENT' => '',
'__VIEWSTATEGENERATOR' => $viewGen[1],
'__EVENTVALIDATION' => $eventVal[1],
'ctl00$MainContent$myScriptManager' => 'MainContent_myScriptManager',
'ctl00$MainContent$txtFilingName' => 'test',
'ctl00$MainContent$searchOpt' => 'chkSearchStartWith',
'ctl00$MainContent$txtFilingID' => '',
'ctl00$MainContent$cmdSearch' => 'Search',
'__ASYNCPOST' => 'true',
'ctl00$MainContent$myScriptManager' => 'ctl00$MainContent$UpdatePanel1|ctl00$MainContent$cmdSearch',
);
$response = $crawler->post(['url' => $link, 'data' => $postData])->dumpHeaders()->run();
echo "<textarea style='width: 90%; height: 200px;'>".$response['body']."</textarea>";
我的问题是 ASP 异步响应不是 HTML - 它是包含 HTML 的文本:
<html>
1|#||4|6079|updatePanel|ctl00_MainContentPlaceHolder_ucLicenseLookup_UpdtPanelGridLookup|
<div class="modal-window-lookup-results fade bs-example-modal-lg in">
<div class="modal-header">
[...]
</html>
因此,当 goutte 将其提供给 browser-kit 时,它会崩溃。古特不烂- 你不能给它喂非HTML垃圾。
为了快速解决这个问题,我刚刚做了:
$crawler = $client->request('POST', $url, $params);
// this is a broken crawler because response is not html!
$html = $client->getResponse()->getContent();
$html = substr($html, strpos($html, "<div"));
$html = substr($html, 0, strpos($html, "|hiddenField|")-3);
$html = "<!DOCTYPE html><html>$html</html>";
$crawler = new \Symfony\Component\DomCrawler\Crawler($html);
print $crawler->html();