获取不同的 URL 并写入文件

Fetch different URLs and write to file

我正在尝试获取不同的 URL,例如site.com/page=1, page2 等等。所有获取的数据都应存储在 HTML 文件中,以便使用 Nokogiri 读取。

如果我只读取一个 URL 并将其写入文件,它就可以完美运行。当我扩展脚本以读取所有可能的 URL 时,它不起作用。

def getData
  @a=1
  array = Array.new
  while @a<5 do
    uri = URI.parse("https://exampel.com?pageNr="+@a.to_s+"Size=10")
    http = Net::HTTP.new(uri.host, uri.port)
    http.use_ssl = true
    http.verify_mode = OpenSSL::SSL::VERIFY_NONE
    request = Net::HTTP::Get.new(uri.request_uri)
    puts "Fetching data from "+uri.request_uri
    #puts @cookie
    request['Cookie']=@cookie
    response = http.request(request)
    if response != nil
      array[@a]=response.body
      @a+=1
    end
  end
  File.write('output.html',array) 
end

不用写文件,可以直接把response.body传给Nokogiri:

def get_data
  (1..5).each do |i|
    uri = URI.parse("https://exampel.com?pageNr=#{i}&Size=10")

    http = Net::HTTP.new(uri.host, uri.port)
    http.use_ssl = true
    http.verify_mode = OpenSSL::SSL::VERIFY_NONE

    puts "Fetching data from: #{uri.request_uri}" 

    request = Net::HTTP::Get.new(uri.request_uri)
    request['Cookie'] = @cookie
    response = http.request(request)

    if response
      puts "processing document..." 
      document = Nokogiri::HTML(response.body)

      # process the document
    end
  end
end

参见:Nokogiri Tutorial: How to parse a document