使用 apscheduler 为 Django 项目在 Procfile (Heroku) 中定义 Cron 作业的问题

ISSUES Defining Cron jobs in Procfile (Heroku) using apscheduler for Django project

我在安排 cron 作业时遇到问题,该作业需要抓取网站并将其作为模型 (MOVIE) 的一部分存储在数据库中。

问题是模型似乎是在执行 Procfile 之前加载的。
我应该如何创建一个 运行 在后台内部运行的 cron 作业并将抓取的信息存储到数据库中?这是我的代码:

过程文件:

    web: python manage.py runserver 0.0.0.0:$PORT
    scheduler: python cinemas/scheduler.py

scheduler.py:

# More code above
from cinemas.models import Movie
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()

@sched.scheduled_job('cron', day_of_week='mon-fri', hour=0, minutes=26)    
def get_movies_playing_now():
  global url_movies_playing_now
  Movie.objects.all().delete()
  while(url_movies_playing_now):
    title = []
    description = []
    #Create BeatifulSoup Object with url link
    s = requests.get(url_movies_playing_now, headers=headers)
    soup = bs4.BeautifulSoup(s.text, "html.parser")
    movies = soup.find_all('ul', class_='w462')[0]

    #Find Movie's title
    for movie_title in movies.find_all('h3'):
        title.append(movie_title.text)
    #Find Movie's description
    for movie_description in soup.find_all('ul',
                                           class_='w462')[0].find_all('p'):
        description.append(movie_description.text.replace(" [More]","."))

    for t, d in zip(title, description):
        m = Movie(movie_title=t, movie_description=d)
        m.save()

    #Go to the next page to find more movies
    paging = soup.find( class_='pagenating').find_all('a', class_=lambda x:
                                                      x != "inactive")
    href = ""
    for p in paging:
        if "next" in p.text.lower():
            href = p['href']
    url_movies_playing_now = href

sched.start()
# More code below

cinemas/models.py:

from django.db import models

#Create your models here.

class Movie(models.Model):
    movie_title = models.CharField(max_length=200)
    movie_description = models.CharField(max_length=20200)

这是我在作业 运行 时遇到的错误。

2016-11-17T17:57:06.074914+00:00 app[scheduler.1]: Traceback (most recent call last): 2016-11-17T17:57:06.074931+00:00 app[scheduler.1]: File "cinemas/scheduler.py", line 2, in 2016-11-17T17:57:06.075058+00:00 app[scheduler.1]: import cineplex 2016-11-17T17:57:06.075060+00:00 app[scheduler.1]: File "/app/cinemas/cineplex.py", line 1, in 2016-11-17T17:57:06.075173+00:00 app[scheduler.1]: from cinemas.models import Movie 2016-11-17T17:57:06.075196+00:00 app[scheduler.1]: File "/app/cinemas/models.py", line 5, in 2016-11-17T17:57:06.075295+00:00 app[scheduler.1]: class Movie(models.Model): 2016-11-17T17:57:06.075297+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/db/models/base.py", line 105, in new 2016-11-17T17:57:06.075414+00:00 app[scheduler.1]: app_config = apps.get_containing_app_config(module) 2016-11-17T17:57:06.075440+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/apps/registry.py", line 237, in get_containing_app_config 2016-11-17T17:57:06.075585+00:00 app[scheduler.1]:
self.check_apps_ready() 2016-11-17T17:57:06.075586+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/apps/registry.py", line 124, in check_apps_ready 2016-11-17T17:57:06.075703+00:00 app[scheduler.1]: raise AppRegistryNotReady("Apps aren't loaded yet.") 2016-11-17T17:57:06.075726+00:00 app[scheduler.1]: django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.

如果我不包含模型对象,Cron 作业工作正常。我应该如何 运行 这项工作每天都使用模型对象而不失败?

谢谢

那是因为您不能只导入 Django 包、模型等
为了正常工作,Django 内部需要初始化,这是从 manage.py 触发的。

与其尝试自己重新创建所有这些,我总是将长运行、非网络命令写成custom management command

例如,如果您的应用是 cinemas,您将:

  • 创建./cinemas/management/commands/scheduler.py.
  • 在该文件中,创建一个子class django.core.management.base.BaseCommand(子class必须被称为Command
  • 在 class 中,覆盖 handle()。在您的情况下,这就是您调用 sched.start()
  • 的地方
  • 你的 Procfile 将有 scheduler: python manage.py scheduler

希望对您有所帮助。

您可以通过在 sceduler.py

的顶部添加以下行来解决问题
import django
django.setup()

在 django 文档中 it says

If you’re using components of Django “standalone” – for example, writing a Python script which loads some Django templates and renders them, or uses the ORM to fetch some data – there’s one more step you’ll need in addition to configuring settings.

After you’ve either set DJANGO_SETTINGS_MODULE or called configure(), you’ll need to call django.setup() to load your settings and populate Django’s application registry. For example:

import django
from django.conf import settings
from myapp import myapp_defaults

settings.configure(default_settings=myapp_defaults, DEBUG=True)
django.setup()

# Now this script or any imported module can use any part of Django it needs.
from myapp import models

我将 DJANGO_SETTINGS_MODULE 设置为配置变量,因此没有将其添加到我的调度程序中。