网络爬虫跳过 URLS
Webcrawler skipping URLS
我正在编写一个扫描易受攻击网站的程序,我碰巧知道有几个网站存在漏洞,并且 return 一个 SQL 语法错误,但是,当我运行 程序,它会跳过这些站点,并且不会输出它们的找到位置或输出它们保存到文件中的位置。该程序正在用于渗透测试,所有网站所有者都已意识到该漏洞。
来源:
def get_urls
info("Searching for possible SQL vulnerable sites.")
@agent = Mechanize.new
page = @agent.get('http://www.google.com/')
google_form = page.form('f')
google_form.q = "#{SEARCH}"
url = @agent.submit(google_form, google_form.buttons.first)
url.links.each do |link|
if link.href.to_s =~ /url.q/
str = link.href.to_s
str_list = str.split(%r{=|&})
urls = str_list[1]
next if str_list[1].split('/')[2] == "webcache.googleusercontent.com"
urls_to_log = urls.gsub("%3F", '?').gsub("%3D", '=')
success("Site found: #{urls_to_log}")
File.open("#{PATH}/temp/SQL_sites_to_check.txt", "a+") {|s| s.puts("#{urls_to_log}'")}
end
end
info("Possible vulnerable sites dumped into #{PATH}/temp/SQL_sites_to_check.txt")
end
def check_if_vulnerable
info("Checking if sites are vulnerable.")
IO.read("#{PATH}/temp/SQL_sites_to_check.txt").each_line do |parse|
begin
Timeout::timeout(5) do
parsing = Nokogiri::HTML(RestClient.get("#{parse.chomp}"))
end
rescue Timeout::Error, RestClient::ResourceNotFound, RestClient::SSLCertificateNotVerified, Errno::ECONNABORTED, Mechanize::ResponseCodeError, RestClient::InternalServerError => e
if e
warn("URL: #{parse.chomp} failed with error: [#{e}] dumped to non_exploitable.txt")
File.open("#{PATH}/lib/non_exploitable.txt", "a+"){|s| s.puts(parse)}
else
success("SQL syntax error discovered in URL: #{parse.chomp} dumped to SQL_VULN.txt")
File.open("#{PATH}/lib/SQL_VULN.txt", "a+"){|vuln| vuln.puts(parse)}
end
end
end
end
用法示例:
[22:49:29 INFO]Checking if sites are vulnerable.
[22:49:53 WARNING]URL: http://www.police.bd/content.php?id=275' failed with error: [execution expired] dumped to non_exploitable.txt
包含 URL 的文件:
http://www.bible.com/subcat.php?id=2'
http://www.cidko.com/pro_con.php?id=3'
http://www.slavsandtat.com/about.php?id=25'
http://www.police.bd/content.php?id=275'
http://www.icdcprage.org/index.php?id=10'
http://huawei.com/en/plugin.php?id=hwdownload'
https://huawei.com/en/plugin.php?id=unlock'
https://facebook.com/profile.php?id'
http://www.footballclub.com.au/index.php?id=43'
http://www.mesrs.qc.ca/index.php?id=1525'
如您所见,程序会跳过 3 个 URL 并直接转到第四个,为什么?
我是不是做错了什么会发生这种情况?
我不确定 rescue
块是否在它应该在的位置。你没有对你在 parsing = Nokogiri::HTML(RestClient.get("#{parse.chomp}"))
中获取的内容做任何事情,对于前三个,它可能只是工作,因此没有异常,也没有错误输出。在该行之后添加一些输出以查看它们被提取。
我正在编写一个扫描易受攻击网站的程序,我碰巧知道有几个网站存在漏洞,并且 return 一个 SQL 语法错误,但是,当我运行 程序,它会跳过这些站点,并且不会输出它们的找到位置或输出它们保存到文件中的位置。该程序正在用于渗透测试,所有网站所有者都已意识到该漏洞。
来源:
def get_urls
info("Searching for possible SQL vulnerable sites.")
@agent = Mechanize.new
page = @agent.get('http://www.google.com/')
google_form = page.form('f')
google_form.q = "#{SEARCH}"
url = @agent.submit(google_form, google_form.buttons.first)
url.links.each do |link|
if link.href.to_s =~ /url.q/
str = link.href.to_s
str_list = str.split(%r{=|&})
urls = str_list[1]
next if str_list[1].split('/')[2] == "webcache.googleusercontent.com"
urls_to_log = urls.gsub("%3F", '?').gsub("%3D", '=')
success("Site found: #{urls_to_log}")
File.open("#{PATH}/temp/SQL_sites_to_check.txt", "a+") {|s| s.puts("#{urls_to_log}'")}
end
end
info("Possible vulnerable sites dumped into #{PATH}/temp/SQL_sites_to_check.txt")
end
def check_if_vulnerable
info("Checking if sites are vulnerable.")
IO.read("#{PATH}/temp/SQL_sites_to_check.txt").each_line do |parse|
begin
Timeout::timeout(5) do
parsing = Nokogiri::HTML(RestClient.get("#{parse.chomp}"))
end
rescue Timeout::Error, RestClient::ResourceNotFound, RestClient::SSLCertificateNotVerified, Errno::ECONNABORTED, Mechanize::ResponseCodeError, RestClient::InternalServerError => e
if e
warn("URL: #{parse.chomp} failed with error: [#{e}] dumped to non_exploitable.txt")
File.open("#{PATH}/lib/non_exploitable.txt", "a+"){|s| s.puts(parse)}
else
success("SQL syntax error discovered in URL: #{parse.chomp} dumped to SQL_VULN.txt")
File.open("#{PATH}/lib/SQL_VULN.txt", "a+"){|vuln| vuln.puts(parse)}
end
end
end
end
用法示例:
[22:49:29 INFO]Checking if sites are vulnerable.
[22:49:53 WARNING]URL: http://www.police.bd/content.php?id=275' failed with error: [execution expired] dumped to non_exploitable.txt
包含 URL 的文件:
http://www.bible.com/subcat.php?id=2'
http://www.cidko.com/pro_con.php?id=3'
http://www.slavsandtat.com/about.php?id=25'
http://www.police.bd/content.php?id=275'
http://www.icdcprage.org/index.php?id=10'
http://huawei.com/en/plugin.php?id=hwdownload'
https://huawei.com/en/plugin.php?id=unlock'
https://facebook.com/profile.php?id'
http://www.footballclub.com.au/index.php?id=43'
http://www.mesrs.qc.ca/index.php?id=1525'
如您所见,程序会跳过 3 个 URL 并直接转到第四个,为什么?
我是不是做错了什么会发生这种情况?
我不确定 rescue
块是否在它应该在的位置。你没有对你在 parsing = Nokogiri::HTML(RestClient.get("#{parse.chomp}"))
中获取的内容做任何事情,对于前三个,它可能只是工作,因此没有异常,也没有错误输出。在该行之后添加一些输出以查看它们被提取。