由于 http 405 错误,使用 POST 从网站自动检索数据无法正常工作
Automated data retrieval from website using POST not working due to http 405 error
我正在研究类似于 Automate data retrieval from a web site using a Ruby web bot 的东西来自动化脚本以从结果网站获取数据(提交 roll no 并获得结果)。
我在这里使用 Ruby 并使用 POST 方法提交卷号并获取结果页面,但是由于接受卷号的主要目标页面是.htm页面,因此我以
结尾
HTTP Status 405 - Request method 'POST' not supported
(此问题的大多数解决方案建议在服务器端进行更新,这不在我的控制范围内)我想自动化并获取结果,以便为数据挖掘目的创建数据集。
我尝试研究如何使用自动脚本获取结果页面,但没有得到令人满意的结果。任何人都可以通过任何类型的脚本让我知道如何实现它吗?语言对我来说无关紧要,因为数据收集是目标。
实现此目标的任何指导都会有所帮助。
我正在尝试从网站
中提取 2007 年的结果
http://resultsarchives.nic.in/cbseresults/cbseresults2007/aieee/cbseaieee.htm
您可以使用示例卷号系列 24800000 .. 24809999,假设 24801002 是有效卷号(8 位数字),以查看结果如何显示。
我用不同的语言标记了问题,因为我觉得这些语言中的任何一种都可能存在解决方案。
如果您检查浏览器在使用该网站时发出的请求,您会看到 POST 请求发送至:
http://resultsarchives.nic.in/cbseresults/cbseresults2007/aieee/cbseaieee.asp
附数据:
regno: 24801002
您可以定位此 URL 以抓取网站(如果允许的话)。
据我所知,该站点 Terms of Use 不禁止程序访问,因此我认为您对此没有意见。
使用 Perl 的 WWW::Mechanize
模块非常简单。
看起来像这样。请注意,输出只是 HTML 页面的文本内容,因此没有换行符。如果你想要 HTML 本身而不是使用 $mech->content
而不是 $mech->text
.
use strict;
use warnings;
use WWW::Mechanize;
my $url = 'http://resultsarchives.nic.in/cbseresults/cbseresults2007/aieee/cbseaieee.htm';
my $mech = WWW::Mechanize->new;
$mech->get($url);
$mech->submit_form( fields => { regno => 24809999 } );
print $mech->text, "\n";
输出
CBSE - ALL INDIA ENGINEERING / ARCHITECTURE ENTRANCE EXAMINATION 2007 http://cbseresults.nic.in Examination Results 2007 Brought to you by National Informatics Centre ALL INDIA ENGINEERING / ARCHITECTURE ENTRANCE EXAMINATION (AIEEE - 2007) Roll No: 24809999 Name: BHOSALE CHETAN ANIL Mother's Name: BHOSALE UJJVALA ANIL Father's Name: BHOSALE ANIL BHAGWANRAO Paper Subjects Marks ObtainedPaper-1 Physics, Chemistry & Mathematics -2 Paper-2 Mathematics & Aptitude Test Not Applicable/Not Applied B.E./B.Tech B.Arch All India Rank 539404 ------ State Rank( State code of eligibility : 21 ) 41736 ------ Remarks: BTECH: - Not Eligible for Central CounsellingBARCH: - Note: For details on central counselling, Please visit http://ccb.nic.in or http://aieee.nic.in Cut off score for the purpose of counselling has been decided by Central Counselling Board. State Ranks are based on State Code of Eligibility ie. State from where the candidate has passsed +2 examination. Those who have not filled up State Code of eligibility, their State rank has not been indicated. State rank is privisional subject to verification of documents at the time of counselling. Disclaimer: Neither NIC nor CBSE is responsible for any inadvertent error that may have crept in the results being published on NET. The results published on net are for immediate information to the examinees. These cannot be treated as original Score card. Original Score cards shall be despatched by the Board. Designed, Developed and Hosted by National Informatics Centre
我正在研究类似于 Automate data retrieval from a web site using a Ruby web bot 的东西来自动化脚本以从结果网站获取数据(提交 roll no 并获得结果)。
我在这里使用 Ruby 并使用 POST 方法提交卷号并获取结果页面,但是由于接受卷号的主要目标页面是.htm页面,因此我以
结尾HTTP Status 405 - Request method 'POST' not supported
(此问题的大多数解决方案建议在服务器端进行更新,这不在我的控制范围内)我想自动化并获取结果,以便为数据挖掘目的创建数据集。
我尝试研究如何使用自动脚本获取结果页面,但没有得到令人满意的结果。任何人都可以通过任何类型的脚本让我知道如何实现它吗?语言对我来说无关紧要,因为数据收集是目标。
实现此目标的任何指导都会有所帮助。
我正在尝试从网站
中提取 2007 年的结果http://resultsarchives.nic.in/cbseresults/cbseresults2007/aieee/cbseaieee.htm
您可以使用示例卷号系列 24800000 .. 24809999,假设 24801002 是有效卷号(8 位数字),以查看结果如何显示。
我用不同的语言标记了问题,因为我觉得这些语言中的任何一种都可能存在解决方案。
如果您检查浏览器在使用该网站时发出的请求,您会看到 POST 请求发送至:
http://resultsarchives.nic.in/cbseresults/cbseresults2007/aieee/cbseaieee.asp
附数据:
regno: 24801002
您可以定位此 URL 以抓取网站(如果允许的话)。
据我所知,该站点 Terms of Use 不禁止程序访问,因此我认为您对此没有意见。
使用 Perl 的 WWW::Mechanize
模块非常简单。
看起来像这样。请注意,输出只是 HTML 页面的文本内容,因此没有换行符。如果你想要 HTML 本身而不是使用 $mech->content
而不是 $mech->text
.
use strict;
use warnings;
use WWW::Mechanize;
my $url = 'http://resultsarchives.nic.in/cbseresults/cbseresults2007/aieee/cbseaieee.htm';
my $mech = WWW::Mechanize->new;
$mech->get($url);
$mech->submit_form( fields => { regno => 24809999 } );
print $mech->text, "\n";
输出
CBSE - ALL INDIA ENGINEERING / ARCHITECTURE ENTRANCE EXAMINATION 2007 http://cbseresults.nic.in Examination Results 2007 Brought to you by National Informatics Centre ALL INDIA ENGINEERING / ARCHITECTURE ENTRANCE EXAMINATION (AIEEE - 2007) Roll No: 24809999 Name: BHOSALE CHETAN ANIL Mother's Name: BHOSALE UJJVALA ANIL Father's Name: BHOSALE ANIL BHAGWANRAO Paper Subjects Marks ObtainedPaper-1 Physics, Chemistry & Mathematics -2 Paper-2 Mathematics & Aptitude Test Not Applicable/Not Applied B.E./B.Tech B.Arch All India Rank 539404 ------ State Rank( State code of eligibility : 21 ) 41736 ------ Remarks: BTECH: - Not Eligible for Central CounsellingBARCH: - Note: For details on central counselling, Please visit http://ccb.nic.in or http://aieee.nic.in Cut off score for the purpose of counselling has been decided by Central Counselling Board. State Ranks are based on State Code of Eligibility ie. State from where the candidate has passsed +2 examination. Those who have not filled up State Code of eligibility, their State rank has not been indicated. State rank is privisional subject to verification of documents at the time of counselling. Disclaimer: Neither NIC nor CBSE is responsible for any inadvertent error that may have crept in the results being published on NET. The results published on net are for immediate information to the examinees. These cannot be treated as original Score card. Original Score cards shall be despatched by the Board. Designed, Developed and Hosted by National Informatics Centre