获取不同的 URL 并写入文件
Fetch different URLs and write to file
我正在尝试获取不同的 URL,例如site.com/page=1, page2 等等。所有获取的数据都应存储在 HTML 文件中,以便使用 Nokogiri 读取。
如果我只读取一个 URL 并将其写入文件,它就可以完美运行。当我扩展脚本以读取所有可能的 URL 时,它不起作用。
def getData
@a=1
array = Array.new
while @a<5 do
uri = URI.parse("https://exampel.com?pageNr="+@a.to_s+"Size=10")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new(uri.request_uri)
puts "Fetching data from "+uri.request_uri
#puts @cookie
request['Cookie']=@cookie
response = http.request(request)
if response != nil
array[@a]=response.body
@a+=1
end
end
File.write('output.html',array)
end
不用写文件,可以直接把response.body
传给Nokogiri
:
def get_data
(1..5).each do |i|
uri = URI.parse("https://exampel.com?pageNr=#{i}&Size=10")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
puts "Fetching data from: #{uri.request_uri}"
request = Net::HTTP::Get.new(uri.request_uri)
request['Cookie'] = @cookie
response = http.request(request)
if response
puts "processing document..."
document = Nokogiri::HTML(response.body)
# process the document
end
end
end
我正在尝试获取不同的 URL,例如site.com/page=1, page2 等等。所有获取的数据都应存储在 HTML 文件中,以便使用 Nokogiri 读取。
如果我只读取一个 URL 并将其写入文件,它就可以完美运行。当我扩展脚本以读取所有可能的 URL 时,它不起作用。
def getData
@a=1
array = Array.new
while @a<5 do
uri = URI.parse("https://exampel.com?pageNr="+@a.to_s+"Size=10")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new(uri.request_uri)
puts "Fetching data from "+uri.request_uri
#puts @cookie
request['Cookie']=@cookie
response = http.request(request)
if response != nil
array[@a]=response.body
@a+=1
end
end
File.write('output.html',array)
end
不用写文件,可以直接把response.body
传给Nokogiri
:
def get_data
(1..5).each do |i|
uri = URI.parse("https://exampel.com?pageNr=#{i}&Size=10")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
puts "Fetching data from: #{uri.request_uri}"
request = Net::HTTP::Get.new(uri.request_uri)
request['Cookie'] = @cookie
response = http.request(request)
if response
puts "processing document..."
document = Nokogiri::HTML(response.body)
# process the document
end
end
end