Rake 任务不在数据库中保存或创建新记录
Rake task not saving or creating new record in database
我创建了一个 ruby 脚本,如果我从控制台 运行 它可以正常执行。
该脚本从各种网站获取一些信息并将其保存到我的数据库table。
然而,当我想把代码变成rake任务时,代码仍然是运行s,但它没有保存任何新记录。我也没有从耙子中得到任何错误。
# Add your own tasks in files placed in lib/tasks ending in .rake,
# for example lib/tasks/capistrano.rake, and they will automatically be available to Rake.
require File.expand_path('../config/application', __FILE__)
Rails.application.load_tasks
require './crawler2.rb'
task :default => [:crawler]
task :crawler do
### ###
require 'rubygems'
require 'nokogiri'
require 'open-uri'
start = Time.now
$a = 0
sites = ["http://www.nytimes.com","http://www.news.com"]
for $a in 0..sites.size-1
url = sites[$a]
$i = 75
$error = 0
avoid_these_links = ["/tv", "//www.facebook.com/"]
doc = Nokogiri::HTML(open(url))
links = doc.css("a")
hrefs = links.map {|link| link.attribute('href').to_s}.uniq.sort.delete_if {|href| href.empty?}.delete_if {|href| avoid_these_links.any? { |w| href =~ /#{w}/ }}.delete_if {|href| href.size < 10 }
#puts hrefs.length
#puts hrefs
for $i in 0..hrefs.length
begin
#puts hrefs[60] #for debugging)
#file = open(url)
#doc = Nokogiri::HTML(file) do
if hrefs[$i].downcase().include? "http://"
doc = Nokogiri::HTML(open(hrefs[$i]))
else
doc = Nokogiri::HTML(open(url+hrefs[$i]))
end
image = doc.at('meta[property="og:image"]')['content']
title = doc.at('meta[property="og:title"]')['content']
article_url = doc.at('meta[property="og:url"]')['content']
description = doc.at('meta[property="og:description"]')['content']
category = doc.at('meta[name="keywords"]')['content']
newspaper_id = 1
puts "\n"
puts $i
#puts "Image: " + image
#puts "Title: " + title
#puts "Url: " + article_url
#puts "Description: " + description
puts "Catory: " + category
Article.create({
:headline => title,
:caption => description,
:thumbnail_url => image,
:category_id => 3,
:status => true,
:journalist_id => 2,
:newspaper_id => newspaper_id,
:from_crawler => true,
:description => description,
:original_url => article_url}) unless Article.exists?(original_url: article_url)
$i +=1
#puts $i #for debugging
rescue
#puts "Error here: " + url+hrefs[$i] if $i < hrefs.length
$i +=1 # do_something_* again, with the next i
$error +=1
end
end
puts "Page: " + url
puts "Articles: " + hrefs.length.to_s
puts "Errors: " + $error.to_s
$a +=1
end
finish = Time.now
diff = ((finish - start)/60).to_s
puts diff + " Minutes"
### ###
end
如果我将文件另存为 crawler.rb 并通过执行 --> “load './crawler2.rb'” 在控制台中打开它,代码执行得很好。当我在 rake 任务中使用完全相同的代码时,我没有得到新记录。
我知道哪里出了问题。
我需要删除:
require './crawler2.rb'
task :default => [:crawler]
改为编辑以下内容:
task :crawler => :environment do
现在爬虫在 Heroku 调度程序的帮助下每十分钟运行一次:-)
感谢大家的帮助 - 对于格式错误,我们深表歉意。希望这个回答可以帮助到其他人。
我创建了一个 ruby 脚本,如果我从控制台 运行 它可以正常执行。
该脚本从各种网站获取一些信息并将其保存到我的数据库table。
然而,当我想把代码变成rake任务时,代码仍然是运行s,但它没有保存任何新记录。我也没有从耙子中得到任何错误。
# Add your own tasks in files placed in lib/tasks ending in .rake,
# for example lib/tasks/capistrano.rake, and they will automatically be available to Rake.
require File.expand_path('../config/application', __FILE__)
Rails.application.load_tasks
require './crawler2.rb'
task :default => [:crawler]
task :crawler do
### ###
require 'rubygems'
require 'nokogiri'
require 'open-uri'
start = Time.now
$a = 0
sites = ["http://www.nytimes.com","http://www.news.com"]
for $a in 0..sites.size-1
url = sites[$a]
$i = 75
$error = 0
avoid_these_links = ["/tv", "//www.facebook.com/"]
doc = Nokogiri::HTML(open(url))
links = doc.css("a")
hrefs = links.map {|link| link.attribute('href').to_s}.uniq.sort.delete_if {|href| href.empty?}.delete_if {|href| avoid_these_links.any? { |w| href =~ /#{w}/ }}.delete_if {|href| href.size < 10 }
#puts hrefs.length
#puts hrefs
for $i in 0..hrefs.length
begin
#puts hrefs[60] #for debugging)
#file = open(url)
#doc = Nokogiri::HTML(file) do
if hrefs[$i].downcase().include? "http://"
doc = Nokogiri::HTML(open(hrefs[$i]))
else
doc = Nokogiri::HTML(open(url+hrefs[$i]))
end
image = doc.at('meta[property="og:image"]')['content']
title = doc.at('meta[property="og:title"]')['content']
article_url = doc.at('meta[property="og:url"]')['content']
description = doc.at('meta[property="og:description"]')['content']
category = doc.at('meta[name="keywords"]')['content']
newspaper_id = 1
puts "\n"
puts $i
#puts "Image: " + image
#puts "Title: " + title
#puts "Url: " + article_url
#puts "Description: " + description
puts "Catory: " + category
Article.create({
:headline => title,
:caption => description,
:thumbnail_url => image,
:category_id => 3,
:status => true,
:journalist_id => 2,
:newspaper_id => newspaper_id,
:from_crawler => true,
:description => description,
:original_url => article_url}) unless Article.exists?(original_url: article_url)
$i +=1
#puts $i #for debugging
rescue
#puts "Error here: " + url+hrefs[$i] if $i < hrefs.length
$i +=1 # do_something_* again, with the next i
$error +=1
end
end
puts "Page: " + url
puts "Articles: " + hrefs.length.to_s
puts "Errors: " + $error.to_s
$a +=1
end
finish = Time.now
diff = ((finish - start)/60).to_s
puts diff + " Minutes"
### ###
end
如果我将文件另存为 crawler.rb 并通过执行 --> “load './crawler2.rb'” 在控制台中打开它,代码执行得很好。当我在 rake 任务中使用完全相同的代码时,我没有得到新记录。
我知道哪里出了问题。
我需要删除:
require './crawler2.rb'
task :default => [:crawler]
改为编辑以下内容:
task :crawler => :environment do
现在爬虫在 Heroku 调度程序的帮助下每十分钟运行一次:-)
感谢大家的帮助 - 对于格式错误,我们深表歉意。希望这个回答可以帮助到其他人。