Rake 任务不在数据库中保存或创建新记录

Rake task not saving or creating new record in database

我创建了一个 ruby 脚本,如果我从控制台 运行 它可以正常执行。

该脚本从各种网站获取一些信息并将其保存到我的数据库table。

然而,当我想把代码变成rake任务时,代码仍然是运行s,但它没有保存任何新记录。我也没有从耙子中得到任何错误。

# Add your own tasks in files placed in lib/tasks ending in .rake,
# for example lib/tasks/capistrano.rake, and they will automatically be           available to Rake.

require File.expand_path('../config/application', __FILE__)

Rails.application.load_tasks

require './crawler2.rb'
task :default => [:crawler]

task :crawler do

### ###

require 'rubygems'
require 'nokogiri'
require 'open-uri'

start = Time.now

$a = 0

sites = ["http://www.nytimes.com","http://www.news.com"]

for $a in 0..sites.size-1

url = sites[$a] 

$i = 75

$error = 0

avoid_these_links = ["/tv", "//www.facebook.com/"]

doc = Nokogiri::HTML(open(url))

    links = doc.css("a")
    hrefs = links.map {|link| link.attribute('href').to_s}.uniq.sort.delete_if {|href| href.empty?}.delete_if {|href| avoid_these_links.any? { |w| href =~ /#{w}/ }}.delete_if {|href| href.size < 10 }

#puts hrefs.length

#puts hrefs

for $i in 0..hrefs.length
    begin

        #puts hrefs[60] #for debugging)

    #file = open(url)
    #doc = Nokogiri::HTML(file) do

        if hrefs[$i].downcase().include? "http://"

            doc = Nokogiri::HTML(open(hrefs[$i]))

        else 

            doc = Nokogiri::HTML(open(url+hrefs[$i]))

        end 

        image = doc.at('meta[property="og:image"]')['content']
        title = doc.at('meta[property="og:title"]')['content']
        article_url = doc.at('meta[property="og:url"]')['content']
        description = doc.at('meta[property="og:description"]')['content']
        category = doc.at('meta[name="keywords"]')['content']

        newspaper_id = 1 


        puts "\n"
        puts $i
        #puts "Image: " + image
        #puts "Title: " + title
        #puts "Url: " + article_url
        #puts "Description: " + description
        puts "Catory: " + category

            Article.create({ 
            :headline => title, 
            :caption => description, 
            :thumbnail_url => image, 
            :category_id => 3, 
            :status => true, 
            :journalist_id => 2, 
            :newspaper_id => newspaper_id, 
            :from_crawler => true,
            :description => description,
            :original_url => article_url}) unless Article.exists?(original_url: article_url)

        $i +=1

        #puts $i #for debugging

        rescue
        #puts "Error here: " + url+hrefs[$i] if $i < hrefs.length
        $i +=1    # do_something_* again, with the next i
        $error +=1

    end 

end

puts "Page: " + url
puts "Articles: " + hrefs.length.to_s
puts "Errors: " + $error.to_s

$a +=1

end

finish = Time.now

diff = ((finish - start)/60).to_s

puts diff + " Minutes"


### ###


end

如果我将文件另存为 crawler.rb 并通过执行 --> “load './crawler2.rb'” 在控制台中打开它,代码执行得很好。当我在 rake 任务中使用完全相同的代码时,我没有得到新记录。

我知道哪里出了问题。

我需要删除:

require './crawler2.rb'
task :default => [:crawler]

改为编辑以下内容:

task :crawler => :environment do

现在爬虫在 Heroku 调度程序的帮助下每十分钟运行一次:-)

感谢大家的帮助 - 对于格式错误,我们深表歉意。希望这个回答可以帮助到其他人。