使用 Nokogiri gem 抓取网站时如何过滤我的结果?
How do I filter my results when scraping a website using Nokogiri gem?
我正在尝试从 Deliveroo.co.uk
中抓取我的邮政编码的餐馆列表
我需要添加一种方法来确定餐厅是开门还是关门...从网站上看很清楚,但我只需要更新我的代码来反映这一点。
我该怎么做?我需要创建类似 'status' 的变量,然后将每个餐厅设置为 'open' 或 'closed'。
这是我要从中抓取的网站:https://deliveroo.co.uk/restaurants/london/maida-vale?postcode=W92DE&time=1800&day=today
下面是我的代码。
谢谢。
require 'open-uri'
require 'nokogiri'
require 'csv'
# Store URL to be scraped
url = "https://deliveroo.co.uk/restaurants/london/maida-vale?postcode=W92DE"
# Parse the page with Nokogiri
page = Nokogiri::HTML(open(url))
# Display output onto the screen
name =[]
page.css('span.list-item-title.restaurant-name').each do |line|
name << line.text
end
category = []
page.css('span.restaurant-detail.detail-cat').each do |line|
category << line.text
end
delivery_time = []
page.css('span.restaurant-detail.detail-time').each do |line|
delivery_time << line.text
end
distance = []
page.css('span.restaurant-detail.detail-distance').each do |line|
distance << line.text
end
status = []
# Write data to CSV file
CSV.open("deliveroo.csv", "w") do |file|
file << ["Name", "Category", "Delivery Time", "Distance", "Status"]
name.length.times do |i|
file << [name[i], category[i], delivery_time[i], distance[i]]
end
end
end
我们需要检查 li.restaurant--details
有/没有 class unavailable
关闭/打开餐厅。
status = []
page.css('li.restaurant--details').each do |line|
if line.attr("class").include? "unavailable"
sts = "closed"
else
sts = "open"
end
status << sts
end
顺便说一句,你应该在获取 restaurant_name 时删除白色 space,等等...
page.css('span.list-item-title.restaurant-name').each do |line|
name << line.text.strip
end
你可以在这里参考我的代码:https://gist.github.com/vinhnglx/4eaeb2e8511dd1454f42
我正在尝试从 Deliveroo.co.uk
中抓取我的邮政编码的餐馆列表我需要添加一种方法来确定餐厅是开门还是关门...从网站上看很清楚,但我只需要更新我的代码来反映这一点。
我该怎么做?我需要创建类似 'status' 的变量,然后将每个餐厅设置为 'open' 或 'closed'。
这是我要从中抓取的网站:https://deliveroo.co.uk/restaurants/london/maida-vale?postcode=W92DE&time=1800&day=today
下面是我的代码。
谢谢。
require 'open-uri'
require 'nokogiri'
require 'csv'
# Store URL to be scraped
url = "https://deliveroo.co.uk/restaurants/london/maida-vale?postcode=W92DE"
# Parse the page with Nokogiri
page = Nokogiri::HTML(open(url))
# Display output onto the screen
name =[]
page.css('span.list-item-title.restaurant-name').each do |line|
name << line.text
end
category = []
page.css('span.restaurant-detail.detail-cat').each do |line|
category << line.text
end
delivery_time = []
page.css('span.restaurant-detail.detail-time').each do |line|
delivery_time << line.text
end
distance = []
page.css('span.restaurant-detail.detail-distance').each do |line|
distance << line.text
end
status = []
# Write data to CSV file
CSV.open("deliveroo.csv", "w") do |file|
file << ["Name", "Category", "Delivery Time", "Distance", "Status"]
name.length.times do |i|
file << [name[i], category[i], delivery_time[i], distance[i]]
end
end
end
我们需要检查 li.restaurant--details
有/没有 class unavailable
关闭/打开餐厅。
status = []
page.css('li.restaurant--details').each do |line|
if line.attr("class").include? "unavailable"
sts = "closed"
else
sts = "open"
end
status << sts
end
顺便说一句,你应该在获取 restaurant_name 时删除白色 space,等等...
page.css('span.list-item-title.restaurant-name').each do |line|
name << line.text.strip
end
你可以在这里参考我的代码:https://gist.github.com/vinhnglx/4eaeb2e8511dd1454f42