抓取的结果未更新
Scraped Results Not Updating
我正在通过 Whenever gem 进行安排,但是,我抓取的结果似乎根本没有得到更新。
我认为这是因为它正在被保存(即较早的结果),所以它只显示那些结果但我不确定。
控制器:
class EntriesController < ApplicationController
def index
@entries = Entry.all
end
def scrape
RedditScrapper.scrape
respond_to do |format|
format.html { redirect_to entries_url, notice: 'Entries were successfully scraped.' }
format.json { entriesArray.to_json }
end
end
end
lib/reddit_scrapper.rb:
require 'open-uri'
module RedditScrapper
def self.scrape
doc = Nokogiri::HTML(open("https://www.reddit.com/"))
entries = doc.css('.entry')
entriesArray = []
entries.each do |entry|
title = entry.css('p.title > a').text
link = entry.css('p.title > a')[0]['href']
entriesArray << Entry.new({ title: title, link: link })
end
if entriesArray.map(&:valid?)
entriesArray.map(&:save!)
end
end
end
config/schedule.rb:
RAILS_ROOT = File.expand_path(File.dirname(__FILE__) + '/')
every 2.minutes do
runner "RedditScrapper.scrape", :environment => "development"
end
型号:
class Entry < ApplicationRecord
end
路线:
Rails.application.routes.draw do
#root 'entry#scrape_reddit'
root 'entries#index'
resources :entries
#get '/new_entries', to: 'entries#scrape', as: 'scrape'
end
查看index.html.erb:
<h1>Reddit's Front Page</h1>
<% @entries.order("created_at DESC").limit(10).each do |entry| %>
<h3><%= entry.title %></h3>
<p><%= entry.link %></p>
<% end %>
仅使用 Entry.create!
创建条目:
module RedditScraper
def self.scrape
doc = Nokogiri::HTML(open("https://www.reddit.com/"))
entries = doc.css('.entry')
entriesArray = []
entries.each do |entry|
title = entry.css('p.title > a').text
link = entry.css('p.title > a')[0]['href']
Entry.create!(title: title, link: link )
end
end
end
获取 10 个最新条目:
# controller
def index
@entries = Entry.order("created_at DESC").limit(10)
end
查看:
<% @entries.each 做 |进入| %>
但也认为您需要更改从 Reddit 解析项目的顺序以获得顶部的最新内容,但您首先将其添加到数据库中。您需要在 Reddit scraper 中进行更改。
还原条目:而不是
entries.each do |entry|
使用
entries.revert.each do |entry|
所以,解析将从条目的末尾开始,最后添加最新消息。
我正在通过 Whenever gem 进行安排,但是,我抓取的结果似乎根本没有得到更新。
我认为这是因为它正在被保存(即较早的结果),所以它只显示那些结果但我不确定。
控制器:
class EntriesController < ApplicationController
def index
@entries = Entry.all
end
def scrape
RedditScrapper.scrape
respond_to do |format|
format.html { redirect_to entries_url, notice: 'Entries were successfully scraped.' }
format.json { entriesArray.to_json }
end
end
end
lib/reddit_scrapper.rb:
require 'open-uri'
module RedditScrapper
def self.scrape
doc = Nokogiri::HTML(open("https://www.reddit.com/"))
entries = doc.css('.entry')
entriesArray = []
entries.each do |entry|
title = entry.css('p.title > a').text
link = entry.css('p.title > a')[0]['href']
entriesArray << Entry.new({ title: title, link: link })
end
if entriesArray.map(&:valid?)
entriesArray.map(&:save!)
end
end
end
config/schedule.rb:
RAILS_ROOT = File.expand_path(File.dirname(__FILE__) + '/')
every 2.minutes do
runner "RedditScrapper.scrape", :environment => "development"
end
型号:
class Entry < ApplicationRecord
end
路线:
Rails.application.routes.draw do
#root 'entry#scrape_reddit'
root 'entries#index'
resources :entries
#get '/new_entries', to: 'entries#scrape', as: 'scrape'
end
查看index.html.erb:
<h1>Reddit's Front Page</h1>
<% @entries.order("created_at DESC").limit(10).each do |entry| %>
<h3><%= entry.title %></h3>
<p><%= entry.link %></p>
<% end %>
仅使用 Entry.create!
创建条目:
module RedditScraper
def self.scrape
doc = Nokogiri::HTML(open("https://www.reddit.com/"))
entries = doc.css('.entry')
entriesArray = []
entries.each do |entry|
title = entry.css('p.title > a').text
link = entry.css('p.title > a')[0]['href']
Entry.create!(title: title, link: link )
end
end
end
获取 10 个最新条目:
# controller
def index
@entries = Entry.order("created_at DESC").limit(10)
end
查看:
<% @entries.each 做 |进入| %>
但也认为您需要更改从 Reddit 解析项目的顺序以获得顶部的最新内容,但您首先将其添加到数据库中。您需要在 Reddit scraper 中进行更改。
还原条目:而不是
entries.each do |entry|
使用
entries.revert.each do |entry|
所以,解析将从条目的末尾开始,最后添加最新消息。